diff options
Diffstat (limited to 'src/lib/bstrlib/bstrlib.txt')
| -rw-r--r-- | src/lib/bstrlib/bstrlib.txt | 3547 |
1 files changed, 3547 insertions, 0 deletions
diff --git a/src/lib/bstrlib/bstrlib.txt b/src/lib/bstrlib/bstrlib.txt new file mode 100644 index 0000000..26fc927 --- /dev/null +++ b/src/lib/bstrlib/bstrlib.txt @@ -0,0 +1,3547 @@ +Better String library +--------------------- + +by Paul Hsieh + +The bstring library is an attempt to provide improved string processing +functionality to the C and C++ language. At the heart of the bstring library +(Bstrlib for short) is the management of "bstring"s which are a significant +improvement over '\0' terminated char buffers. + +=============================================================================== + +Motivation +---------- + +The standard C string library has serious problems: + + 1) Its use of '\0' to denote the end of the string means knowing a + string's length is O(n) when it could be O(1). + 2) It imposes an interpretation for the character value '\0'. + 3) gets() always exposes the application to a buffer overflow. + 4) strtok() modifies the string its parsing and thus may not be usable in + programs which are re-entrant or multithreaded. + 5) fgets has the unusual semantic of ignoring '\0's that occur before + '\n's are consumed. + 6) There is no memory management, and actions performed such as strcpy, + strcat and sprintf are common places for buffer overflows. + 7) strncpy() doesn't '\0' terminate the destination in some cases. + 8) Passing NULL to C library string functions causes an undefined NULL + pointer access. + 9) Parameter aliasing (overlapping, or self-referencing parameters) + within most C library functions has undefined behavior. + 10) Many C library string function calls take integer parameters with + restricted legal ranges. Parameters passed outside these ranges are + not typically detected and cause undefined behavior. + +So the desire is to create an alternative string library that does not suffer +from the above problems and adds in the following functionality: + + 1) Incorporate string functionality seen from other languages. + a) MID$() - from BASIC + b) split()/join() - from Python + c) string/char x n - from Perl + 2) Implement analogs to functions that combine stream IO and char buffers + without creating a dependency on stream IO functionality. + 3) Implement the basic text editor-style functions insert, delete, find, + and replace. + 4) Implement reference based sub-string access (as a generalization of + pointer arithmetic.) + 5) Implement runtime write protection for strings. + +There is also a desire to avoid "API-bloat". So functionality that can be +implemented trivially in other functionality is omitted. So there is no +left$() or right$() or reverse() or anything like that as part of the core +functionality. + +Explaining Bstrings +------------------- + +A bstring is basically a header which wraps a pointer to a char buffer. Lets +start with the declaration of a struct tagbstring: + + struct tagbstring { + int mlen; + int slen; + unsigned char * data; + }; + +This definition is considered exposed, not opaque (though it is neither +necessary nor recommended that low level maintenance of bstrings be performed +whenever the abstract interfaces are sufficient). The mlen field (usually) +describes a lower bound for the memory allocated for the data field. The +slen field describes the exact length for the bstring. The data field is a +single contiguous buffer of unsigned chars. Note that the existence of a '\0' +character in the unsigned char buffer pointed to by the data field does not +necessarily denote the end of the bstring. + +To be a well formed modifiable bstring the mlen field must be at least the +length of the slen field, and slen must be non-negative. Furthermore, the +data field must point to a valid buffer in which access to the first mlen +characters has been acquired. So the minimal check for correctness is: + + (slen >= 0 && mlen >= slen && data != NULL) + +bstrings returned by bstring functions can be assumed to be either NULL or +satisfy the above property. (When bstrings are only readable, the mlen >= +slen restriction is not required; this is discussed later in this section.) +A bstring itself is just a pointer to a struct tagbstring: + + typedef struct tagbstring * bstring; + +Note that use of the prefix "tag" in struct tagbstring is required to work +around the inconsistency between C and C++'s struct namespace usage. This +definition is also considered exposed. + +Bstrlib basically manages bstrings allocated as a header and an associated +data-buffer. Since the implementation is exposed, they can also be +constructed manually. Functions which mutate bstrings assume that the header +and data buffer have been malloced; the bstring library may perform free() or +realloc() on both the header and data buffer of any bstring parameter. +Functions which return bstring's create new bstrings. The string memory is +freed by a bdestroy() call (or using the bstrFree macro). + +The following related typedef is also provided: + + typedef const struct tagbstring * const_bstring; + +which is also considered exposed. These are directly bstring compatible (no +casting required) but are just used for parameters which are meant to be +non-mutable. So in general, bstring parameters which are read as input but +not meant to be modified will be declared as const_bstring, and bstring +parameters which may be modified will be declared as bstring. This convention +is recommended for user written functions as well. + +Since bstrings maintain interoperability with C library char-buffer style +strings, all functions which modify, update or create bstrings also append a +'\0' character into the position slen + 1. This trailing '\0' character is +not required for bstrings input to the bstring functions; this is provided +solely as a convenience for interoperability with standard C char-buffer +functionality. + +Analogs for the ANSI C string library functions have been created when they +are necessary, but have also been left out when they are not. In particular +there are no functions analogous to fwrite, or puts just for the purposes of +bstring. The ->data member of any string is exposed, and therefore can be +used just as easily as char buffers for C functions which read strings. + +For those that wish to hand construct bstrings, the following should be kept +in mind: + + 1) While bstrlib can accept constructed bstrings without terminating + '\0' characters, the rest of the C language string library will not + function properly on such non-terminated strings. This is obvious + but must be kept in mind. + 2) If it is intended that a constructed bstring be written to by the + bstring library functions then the data portion should be allocated + by the malloc function and the slen and mlen fields should be entered + properly. The struct tagbstring header is not reallocated, and only + freed by bdestroy. + 3) Writing arbitrary '\0' characters at various places in the string + will not modify its length as perceived by the bstring library + functions. In fact, '\0' is a legitimate non-terminating character + for a bstring to contain. + 4) For read only parameters, bstring functions do not check the mlen. + I.e., the minimal correctness requirements are reduced to: + + (slen >= 0 && data != NULL) + +Better pointer arithmetic +------------------------- + +One built-in feature of '\0' terminated char * strings, is that its very easy +and fast to obtain a reference to the tail of any string using pointer +arithmetic. Bstrlib does one better by providing a way to get a reference to +any substring of a bstring (or any other length delimited block of memory.) +So rather than just having pointer arithmetic, with bstrlib one essentially +has segment arithmetic. This is achieved using the macro blk2tbstr() which +builds a reference to a block of memory and the macro bmid2tbstr() which +builds a reference to a segment of a bstring. Bstrlib also includes +functions for direct consumption of memory blocks into bstrings, namely +bcatblk () and blk2bstr (). + +One scenario where this can be extremely useful is when string contains many +substrings which one would like to pass as read-only reference parameters to +some string consuming function without the need to allocate entire new +containers for the string data. More concretely, imagine parsing a command +line string whose parameters are space delimited. This can only be done for +tails of the string with '\0' terminated char * strings. + +Improved NULL semantics and error handling +------------------------------------------ + +Unless otherwise noted, if a NULL pointer is passed as a bstring or any other +detectably illegal parameter, the called function will return with an error +indicator (either NULL or BSTR_ERR) rather than simply performing a NULL +pointer access, or having undefined behavior. + +To illustrate the value of this, consider the following example: + + strcpy (p = malloc (13 * sizeof (char)), "Hello,"); + strcat (p, " World"); + +This is not correct because malloc may return NULL (due to an out of memory +condition), and the behaviour of strcpy is undefined if either of its +parameters are NULL. However: + + bstrcat (p = bfromcstr ("Hello,"), q = bfromcstr (" World")); + bdestroy (q); + +is well defined, because if either p or q are assigned NULL (indicating a +failure to allocate memory) both bstrcat and bdestroy will recognize it and +perform no detrimental action. + +Note that it is not necessary to check any of the members of a returned +bstring for internal correctness (in particular the data member does not need +to be checked against NULL when the header is non-NULL), since this is +assured by the bstring library itself. + +bStreams +-------- + +In addition to the bgets and bread functions, bstrlib can abstract streams +with a high performance read only stream called a bStream. In general, the +idea is to open a core stream (with something like fopen) then pass its +handle as well as a bNread function pointer (like fread) to the bsopen +function which will return a handle to an open bStream. Then the functions +bsread, bsreadln or bsreadlns can be called to read portions of the stream. +Finally, the bsclose function is called to close the bStream -- it will +return a handle to the original (core) stream. So bStreams, essentially, +wrap other streams. + +The bStreams have two main advantages over the bgets and bread (as well as +fgets/ungetc) paradigms: + +1) Improved functionality via the bunread function which allows a stream to + unread characters, giving the bStream stack-like functionality if so + desired. +2) A very high performance bsreadln function. The C library function fgets() + (and the bgets function) can typically be written as a loop on top of + fgetc(), thus paying all of the overhead costs of calling fgetc on a per + character basis. bsreadln will read blocks at a time, thus amortizing the + overhead of fread calls over many characters at once. + +However, clearly bStreams are suboptimal or unusable for certain kinds of +streams (stdin) or certain usage patterns (a few spotty, or non-sequential +reads from a slow stream.) For those situations, using bgets will be more +appropriate. + +The semantics of bStreams allows practical construction of layerable data +streams. What this means is that by writing a bNread compatible function on +top of a bStream, one can construct a new bStream on top of it. This can be +useful for writing multi-pass parsers that don't actually read the entire +input more than once and don't require the use of intermediate storage. + +Aliasing +-------- + +Aliasing occurs when a function is given two parameters which point to data +structures which overlap in the memory they occupy. While this does not +disturb read only functions, for many libraries this can make functions that +write to these memory locations malfunction. This is a common problem of the +C standard library and especially the string functions in the C standard +library. + +The C standard string library is entirely char by char oriented (as is +bstring) which makes conforming implementations alias safe for some +scenarios. However no actual detection of aliasing is typically performed, +so it is easy to find cases where the aliasing will cause anomolous or +undesirable behaviour (consider: strcat (p, p).) The C99 standard includes +the "restrict" pointer modifier which allows the compiler to document and +assume a no-alias condition on usage. However, only the most trivial cases +can be caught (if at all) by the compiler at compile time, and thus there is +no actual enforcement of non-aliasing. + +Bstrlib, by contrast, permits aliasing and is completely aliasing safe, in +the C99 sense of aliasing. That is to say, under the assumption that +pointers of incompatible types from distinct objects can never alias, bstrlib +is completely aliasing safe. (In practice this means that the data buffer +portion of any bstring and header of any bstring are assumed to never alias.) +With the exception of the reference building macros, the library behaves as +if all read-only parameters are first copied and replaced by temporary +non-aliased parameters before any writing to any output bstring is performed +(though actual copying is extremely rarely ever done.) + +Besides being a useful safety feature, bstring searching/comparison +functions can improve to O(1) execution when aliasing is detected. + +Note that aliasing detection and handling code in Bstrlib is generally +extremely cheap. There is almost never any appreciable performance penalty +for using aliased parameters. + +Reenterancy +----------- + +Nearly every function in Bstrlib is a leaf function, and is completely +reenterable with the exception of writing to common bstrings. The split +functions which use a callback mechanism requires only that the source string +not be destroyed by the callback function unless the callback function returns +with an error status (note that Bstrlib functions which return an error do +not modify the string in any way.) The string can in fact be modified by the +callback and the behaviour is deterministic. See the documentation of the +various split functions for more details. + +Undefined scenarios +------------------- + +One of the basic important premises for Bstrlib is to not to increase the +propogation of undefined situations from parameters that are otherwise legal +in of themselves. In particular, except for extremely marginal cases, usages +of bstrings that use the bstring library functions alone cannot lead to any +undefined action. But due to C/C++ language and library limitations, there +is no way to define a non-trivial library that is completely without +undefined operations. All such possible undefined operations are described +below: + +1) bstrings or struct tagbstrings that are not explicitely initialized cannot + be passed as a parameter to any bstring function. +2) The members of the NULL bstring cannot be accessed directly. (Though all + APIs and macros detect the NULL bstring.) +3) A bstring whose data member has not been obtained from a malloc or + compatible call and which is write accessible passed as a writable + parameter will lead to undefined results. (i.e., do not writeAllow any + constructed bstrings unless the data portion has been obtained from the + heap.) +4) If the headers of two strings alias but are not identical (which can only + happen via a defective manual construction), then passing them to a + bstring function in which one is writable is not defined. +5) If the mlen member is larger than the actual accessible length of the data + member for a writable bstring, or if the slen member is larger than the + readable length of the data member for a readable bstring, then the + corresponding bstring operations are undefined. +6) Any bstring definition whose header or accessible data portion has been + assigned to inaccessible or otherwise illegal memory clearly cannot be + acted upon by the bstring library in any way. +7) Destroying the source of an incremental split from within the callback + and not returning with a negative value (indicating that it should abort) + will lead to undefined behaviour. (Though *modifying* or adjusting the + state of the source data, even if those modification fail within the + bstrlib API, has well defined behavior.) +8) Modifying a bstring which is write protected by direct access has + undefined behavior. + +While this may seem like a long list, with the exception of invalid uses of +the writeAllow macro, and source destruction during an iterative split +without an accompanying abort, no usage of the bstring API alone can cause +any undefined scenario to occurr. I.e., the policy of restricting usage of +bstrings to the bstring API can significantly reduce the risk of runtime +errors (in practice it should eliminate them) related to string manipulation +due to undefined action. + +C++ wrapper +----------- + +A C++ wrapper has been created to enable bstring functionality for C++ in the +most natural (for C++ programers) way possible. The mandate for the C++ +wrapper is different from the base C bstring library. Since the C++ language +has far more abstracting capabilities, the CBString structure is considered +fully abstracted -- i.e., hand generated CBStrings are not supported (though +conversion from a struct tagbstring is allowed) and all detectable errors are +manifest as thrown exceptions. + +- The C++ class definitions are all under the namespace Bstrlib. bstrwrap.h + enables this namespace (with a using namespace Bstrlib; directive at the + end) unless the macro BSTRLIB_DONT_ASSUME_NAMESPACE has been defined before + it is included. + +- Erroneous accesses results in an exception being thrown. The exception + parameter is of type "struct CBStringException" which is derived from + std::exception if STL is used. A verbose description of the error message + can be obtained from the what() method. + +- CBString is a C++ structure derived from a struct tagbstring. An address + of a CBString cast to a bstring must not be passed to bdestroy. The bstring + C API has been made C++ safe and can be used directly in a C++ project. + +- It includes constructors which can take a char, '\0' terminated char + buffer, tagbstring, (char, repeat-value), a length delimited buffer or a + CBStringList to initialize it. + +- Concatenation is performed with the + and += operators. Comparisons are + done with the ==, !=, <, >, <= and >= operators. Note that == and != use + the biseq call, while <, >, <= and >= use bstrcmp. + +- CBString's can be directly cast to const character buffers. + +- CBString's can be directly cast to double, float, int or unsigned int so + long as the CBString are decimal representations of those types (otherwise + an exception will be thrown). Converting the other way should be done with + the format(a) method(s). + +- CBString contains the length, character and [] accessor methods. The + character and [] accessors are aliases of each other. If the bounds for + the string are exceeded, an exception is thrown. To avoid the overhead for + this check, first cast the CBString to a (const char *) and use [] to + dereference the array as normal. Note that the character and [] accessor + methods allows both reading and writing of individual characters. + +- The methods: format, formata, find, reversefind, findcaseless, + reversefindcaseless, midstr, insert, insertchrs, replace, findreplace, + findreplacecaseless, remove, findchr, nfindchr, alloc, toupper, tolower, + gets, read are analogous to the functions that can be found in the C API. + +- The caselessEqual and caselessCmp methods are analogous to biseqcaseless + and bstricmp functions respectively. + +- Note that just like the bformat function, the format and formata methods do + not automatically cast CBStrings into char * strings for "%s"-type + substitutions: + + CBString w("world"); + CBString h("Hello"); + CBString hw; + + /* The casts are necessary */ + hw.format ("%s, %s", (const char *)h, (const char *)w); + +- The methods trunc and repeat have been added instead of using pattern. + +- ltrim, rtrim and trim methods have been added. These remove characters + from a given character string set (defaulting to the whitespace characters) + from either the left, right or both ends of the CBString, respectively. + +- The method setsubstr is also analogous in functionality to bsetstr, except + that it cannot be passed NULL. Instead the method fill and the fill-style + constructor have been supplied to enable this functionality. + +- The writeprotect(), writeallow() and iswriteprotected() methods are + analogous to the bwriteprotect(), bwriteallow() and biswriteprotected() + macros in the C API. Write protection semantics in CBString are stronger + than with the C API in that indexed character assignment is checked for + write protection. However, unlike with the C API, a write protected + CBString can be destroyed by the destructor. + +- CBStream is a C++ structure which wraps a struct bStream (its not derived + from it, since destruction is slightly different). It is constructed by + passing in a bNread function pointer and a stream parameter cast to void *. + This structure includes methods for detecting eof, setting the buffer + length, reading the whole stream or reading entries line by line or block + by block, an unread function, and a peek function. + +- If STL is available, the CBStringList structure is derived from a vector of + CBString with various split methods. The split method has been overloaded + to accept either a character or CBString as the second parameter (when the + split parameter is a CBString any character in that CBString is used as a + seperator). The splitstr method takes a CBString as a substring seperator. + Joins can be performed via a CBString constructor which takes a + CBStringList as a parameter, or just using the CBString::join() method. + +- If there is proper support for std::iostreams, then the >> and << operators + and the getline() function have been added (with semantics the same as + those for std::string). + +Multithreading +-------------- + +A mutable bstring is kind of analogous to a small (two entry) linked list +allocated by malloc, with all aliasing completely under programmer control. +I.e., manipulation of one bstring will never affect any other distinct +bstring unless explicitely constructed to do so by the programmer via hand +construction or via building a reference. Bstrlib also does not use any +static or global storage, so there are no hidden unremovable race conditions. +Bstrings are also clearly not inherently thread local. So just like +char *'s, bstrings can be passed around from thread to thread and shared and +so on, so long as modifications to a bstring correspond to some kind of +exclusive access lock as should be expected (or if the bstring is read-only, +which can be enforced by bstring write protection) for any sort of shared +object in a multithreaded environment. + +Bsafe module +------------ + +For convenience, a bsafe module has been included. The idea is that if this +module is included, inadvertant usage of the most dangerous C functions will +be overridden and lead to an immediate run time abort. Of course, it should +be emphasized that usage of this module is completely optional. The +intention is essentially to provide an option for creating project safety +rules which can be enforced mechanically rather than socially. This is +useful for larger, or open development projects where its more difficult to +enforce social rules or "coding conventions". + +Problems not solved +------------------- + +Bstrlib is written for the C and C++ languages, which have inherent weaknesses +that cannot be easily solved: + +1. Memory leaks: Forgetting to call bdestroy on a bstring that is about to be + unreferenced, just as forgetting to call free on a heap buffer that is + about to be dereferenced. Though bstrlib itself is leak free. +2. Read before write usage: In C, declaring an auto bstring does not + automatically fill it with legal/valid contents. This problem has been + somewhat mitigated in C++. (The bstrDeclare and bstrFree macros from + bstraux can be used to help mitigate this problem.) + +Other problems not addressed: + +3. Built-in mutex usage to automatically avoid all bstring internal race + conditions in multitasking environments: The problem with trying to + implement such things at this low a level is that it is typically more + efficient to use locks in higher level primitives. There is also no + platform independent way to implement locks or mutexes. + +Note that except for spotty support of wide characters, the default C +standard library does not address any of these problems either. + +Configurable compilation options +-------------------------------- + +The Better String Library is not an application, it is a library. To compile +it, you need to compile bstrlib.c to an object file that is linked to your +application. A Makefile might contain entries such as the following to +accomplish this: + +BSTRDIR = $(CDIR)/bstrlib +INCLUDES = -I$(BSTRDIR) +BSTROBJS = $(ODIR)/bstrlib.o +DEFINES = +CFLAGS = -O3 -Wall -pedantic -ansi -s $(DEFINES) + +application: $(ODIR)/main.o $(BSTROBJS) + echo Linking: $@ + $(CC) $< $(BSTROBJS) -o $@ + +$(ODIR)/%.o : $(BSTRDIR)/%.c + echo Compiling: $< + $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@ + +$(ODIR)/%.o : %.c + echo Compiling: $< + $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@ + +You can configure bstrlib using with the standard macro defines passed to +the compiler. All configuration options are meant solely for the purpose of +compiler compatibility. Configuration options are not meant to change the +semantics or capabilities of the library, except where it is unavoidable. + +Since some C++ compilers don't include the Standard Template Library and some +have the options of disabling exception handling, a number of macros can be +used to conditionally compile support for each of this: + +BSTRLIB_CAN_USE_STL + + - defining this will enable the used of the Standard Template Library. + Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro. + +BSTRLIB_CANNOT_USE_STL + + - defining this will disable the use of the Standard Template Library. + Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro. + +BSTRLIB_CAN_USE_IOSTREAM + + - defining this will enable the used of streams from class std. Defining + BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro. + +BSTRLIB_CANNOT_USE_IOSTREAM + + - defining this will disable the use of streams from class std. Defining + BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro. + +BSTRLIB_THROWS_EXCEPTIONS + + - defining this will enable the exception handling within bstring. + Defining BSTRLIB_THROWS_EXCEPTIONS overrides the + BSTRLIB_DOESNT_THROWS_EXCEPTIONS macro. + +BSTRLIB_DOESNT_THROW_EXCEPTIONS + + - defining this will disable the exception handling within bstring. + Defining BSTRLIB_THROWS_EXCEPTIONS overrides the + BSTRLIB_DOESNT_THROW_EXCEPTIONS macro. + +Note that these macros must be defined consistently throughout all modules +that use CBStrings including bstrwrap.cpp. + +Some older C compilers do not support functions such as vsnprintf. This is +handled by the following macro variables: + +BSTRLIB_NOVSNP + + - defining this indicates that the compiler does not support vsnprintf. + This will cause bformat and bformata to not be declared. Note that + for some compilers, such as Turbo C, this is set automatically. + Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro. + +BSTRLIB_VSNP_OK + + - defining this will disable the autodetection of compilers that do not + vsnprintf. + Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro. + +Semantic compilation options +---------------------------- + +Bstrlib comes with very few compilation options for changing the semantics of +of the library. These are described below. + +BSTRLIB_DONT_ASSUME_NAMESPACE + + - Defining this before including bstrwrap.h will disable the automatic + enabling of the Bstrlib namespace for the C++ declarations. + +BSTRLIB_DONT_USE_VIRTUAL_DESTRUCTOR + + - Defining this will make the CBString destructor non-virtual. + +BSTRLIB_MEMORY_DEBUG + + - Defining this will cause the bstrlib modules bstrlib.c and bstrwrap.cpp + to invoke a #include "memdbg.h". memdbg.h has to be supplied by the user. + +Note that these macros must be defined consistently throughout all modules +that use bstrings or CBStrings including bstrlib.c, bstraux.c and +bstrwrap.cpp. + +Version +------- + +Current release: v1.0.0 + +The version format v[Major].[Minor].[Update] is used to facilitate +developers with backward compatibility in the core developer branch of the +Better String Library. This is also reflected in the macro symbols +BSTR_VER_MAJOR, BSTR_VER_MINOR and BSTR_VER_UPDATE in the bstrlib.h file. +Differences in the Major version imply that there has been a change in the +API, and that a recompile and usage source changes may be necessary. +Differences in Minor version imply that there has been an expansion of the +API, that backward compatibility should be preserved and that at most a +recompile is necessary (unless there is a namespace collision). Differences +in Update imply that no API change has occurred. + +Although ordered, there is no implication of lexical sequencing. In +particular, the Update number will not reset to 0 as the Major and Minor +version numbers increment. + +So simple bug fixes will usually be reflected in a change in the Update +number. If new functions are available, the Minor value will increment. +If any function changes its parameters, or if a function is removed, the +Major value will increment. + + +=============================================================================== + +Files +----- + +Core C files (required for C and C++): +bstrlib.c - C implementaion of bstring functions. +bstrlib.h - C header file for bstring functions. + +Core C++ files (required for C++): +bstrwrap.cpp - C++ implementation of CBString. +bstrwrap.h - C++ header file for CBString. + +Base Unicode support: +utf8util.c - C implemention of generic utf8 parsing functions. +utf8util.h - C head file for generic utf8 parsing functions. +buniutil.c - C implemention utf8 bstring packing and unpacking functions. +buniutil.h - C header file for utf8 bstring functions. + +Extra utility functions: +bstraux.c - C example that implements trivial additional functions. +bstraux.h - C header for bstraux.c + +Miscellaneous: +bstest.c - C unit/regression test for bstrlib.c +test.cpp - C++ unit/regression test for bstrwrap.cpp +bsafe.c - C runtime stubs to abort usage of unsafe C functions. +bsafe.h - C header file for bsafe.c functions. + +C modules need only include bstrlib.h and compile/link bstrlib.c to use the +basic bstring library. C++ projects need to additionally include bstrwrap.h +and compile/link bstrwrap.cpp. For both, there may be a need to make choices +about feature configuration as described in the "Configurable compilation +options" in the section above. + +Other files that are included in this archive are: + +license.txt - The BSD license for Bstrlib +gpl.txt - The GPL version 2 +security.txt - A security statement useful for auditting Bstrlib +porting.txt - A guide to porting Bstrlib +bstrlib.txt - This file + +=============================================================================== + +The functions +------------- + + extern bstring bfromcstr (const char * str); + + Take a standard C library style '\0' terminated char buffer and generate + a bstring with the same contents as the char buffer. If an error occurs + NULL is returned. + + So for example: + + bstring b = bfromcstr ("Hello"); + if (!b) { + fprintf (stderr, "Out of memory"); + } else { + puts ((char *) b->data); + } + + .......................................................................... + + extern bstring bfromcstralloc (int mlen, const char * str); + + Create a bstring which contains the contents of the '\0' terminated + char * buffer str. The memory buffer backing the bstring is at least + mlen characters in length. The buffer is also at least size required + to hold the string with the '\0' terminator. If an error occurs NULL + is returned. + + So for example: + + bstring b = bfromcstralloc (64, someCstr); + if (b) b->data[63] = 'x'; + + The idea is that this will set the 64th character of b to 'x' if it is at + least 64 characters long otherwise do nothing. And we know this is well + defined so long as b was successfully created, since it will have been + allocated with at least 64 characters. + + .......................................................................... + + extern bstring bfromcstrrangealloc (int minl, int maxl, const char* str); + + Create a bstring which contains the contents of the '\0' terminated + char * buffer str. The memory buffer backing the string is at least + minl characters in length, but an attempt is made to allocate up to + maxl characters. The buffer is also at least size required to hold + the string with the '\0' terminator. If an error occurs NULL is + returned. + + So for example: + + bstring b = bfromcstrrangealloc (0, 128, "Hello."); + if (b) b->data[5] = '!'; + + The idea is that this will set the 6th character of b to '!' if it was + allocated otherwise do nothing. And we know this is well defined so + long as b was successfully created, since it will have been allocated + with at least 7 (strlen("Hello.")) characters. + + .......................................................................... + + extern bstring blk2bstr (const void * blk, int len); + + Create a bstring whose contents are described by the contiguous buffer + pointing to by blk with a length of len bytes. Note that this function + creates a copy of the data in blk, rather than simply referencing it. + Compare with the blk2tbstr macro. If an error occurs NULL is returned. + + .......................................................................... + + extern char * bstr2cstr (const_bstring s, char z); + + Create a '\0' terminated char buffer which contains the contents of the + bstring s, except that any contained '\0' characters are converted to the + character in z. This returned value should be freed with bcstrfree(), by + the caller. If an error occurs NULL is returned. + + .......................................................................... + + extern int bcstrfree (char * s); + + Frees a C-string generated by bstr2cstr (). This is normally unnecessary + since it just wraps a call to free (), however, if malloc () and free () + have been redefined as a macros within the bstrlib module (via macros in + the memdbg.h backdoor) with some difference in behaviour from the std + library functions, then this allows a correct way of freeing the memory + that allows higher level code to be independent from these macro + redefinitions. + + .......................................................................... + + extern bstring bstrcpy (const_bstring b1); + + Make a copy of the passed in bstring. The copied bstring is returned if + there is no error, otherwise NULL is returned. + + .......................................................................... + + extern int bassign (bstring a, const_bstring b); + + Overwrite the bstring a with the contents of bstring b. Note that the + bstring a must be a well defined and writable bstring. If an error + occurs BSTR_ERR is returned and a is not overwritten. + + .......................................................................... + + int bassigncstr (bstring a, const char * str); + + Overwrite the string a with the contents of char * string str. Note that + the bstring a must be a well defined and writable bstring. If an error + occurs BSTR_ERR is returned and a may be partially overwritten. + + .......................................................................... + + int bassignblk (bstring a, const void * s, int len); + + Overwrite the string a with the contents of the block (s, len). Note that + the bstring a must be a well defined and writable bstring. If an error + occurs BSTR_ERR is returned and a is not overwritten. + + .......................................................................... + + extern int bassignmidstr (bstring a, const_bstring b, int left, int len); + + Overwrite the bstring a with the middle of contents of bstring b + starting from position left and running for a length len. left and + len are clamped to the ends of b as with the function bmidstr. Note that + the bstring a must be a well defined and writable bstring. If an error + occurs BSTR_ERR is returned and a is not overwritten. + + .......................................................................... + + extern bstring bmidstr (const_bstring b, int left, int len); + + Create a bstring which is the substring of b starting from position left + and running for a length len (clamped by the end of the bstring b.) If + there was no error, the value of this constructed bstring is returned + otherwise NULL is returned. + + .......................................................................... + + extern int bdelete (bstring s1, int pos, int len); + + Removes characters from pos to pos+len-1 and shifts the tail of the + bstring starting from pos+len to pos. len must be positive for this call + to have any effect. The section of the bstring described by (pos, len) + is clamped to boundaries of the bstring b. The value BSTR_OK is returned + if the operation is successful, otherwise BSTR_ERR is returned. + + .......................................................................... + + extern int bconcat (bstring b0, const_bstring b1); + + Concatenate the bstring b1 to the end of bstring b0. The value BSTR_OK + is returned if the operation is successful, otherwise BSTR_ERR is + returned. + + .......................................................................... + + extern int bconchar (bstring b, char c); + + Concatenate the character c to the end of bstring b. The value BSTR_OK + is returned if the operation is successful, otherwise BSTR_ERR is + returned. + + .......................................................................... + + extern int bcatcstr (bstring b, const char * s); + + Concatenate the char * string s to the end of bstring b. The value + BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is + returned. + + .......................................................................... + + extern int bcatblk (bstring b, const void * s, int len); + + Concatenate a fixed length buffer (s, len) to the end of bstring b. The + value BSTR_OK is returned if the operation is successful, otherwise + BSTR_ERR is returned. + + .......................................................................... + + extern int biseq (const_bstring b0, const_bstring b1); + + Compare the bstring b0 and b1 for equality. If the bstrings differ, 0 + is returned, if the bstrings are the same, 1 is returned, if there is an + error, -1 is returned. If the length of the bstrings are different, this + function has O(1) complexity. Contained '\0' characters are not treated + as a termination character. + + Note that the semantics of biseq are not completely compatible with + bstrcmp because of its different treatment of the '\0' character. + + .......................................................................... + + extern int bisstemeqblk (const_bstring b, const void * blk, int len); + + Compare beginning of bstring b0 with a block of memory of length len for + equality. If the beginning of b0 differs from the memory block (or if b0 + is too short), 0 is returned, if the bstrings are the same, 1 is returned, + if there is an error, -1 is returned. + + .......................................................................... + + extern int biseqcaseless (const_bstring b0, const_bstring b1); + + Compare two bstrings for equality without differentiating between case. + If the bstrings differ other than in case, 0 is returned, if the bstrings + are the same, 1 is returned, if there is an error, -1 is returned. If + the length of the bstrings are different, this function is O(1). '\0' + termination characters are not treated in any special way. + + .......................................................................... + + extern int biseqcaselessblk (const_bstring b, const void * blk, int len); + + Compare content of b and the array of bytes in blk for length len for + equality without differentiating between character case. If the content + differs other than in case, 0 is returned, if, ignoring case, the content + is the same, 1 is returned, if there is an error, -1 is returned. If the + length of the strings are different, this function is O(1). '\0' + termination characters are not treated in any special way. + + .......................................................................... + + extern int bisstemeqcaselessblk (const_bstring b0, const void * blk, int len); + + Compare beginning of bstring b0 with a block of memory of length len + without differentiating between case for equality. If the beginning of b0 + differs from the memory block other than in case (or if b0 is too short), + 0 is returned, if the bstrings are the same, 1 is returned, if there is an + error, -1 is returned. + + .......................................................................... + + int biseqblk (const_bstring b, const void * blk, int len) + + Compare the string b with the character block blk of length len. If the + content differs, 0 is returned, if the content is the same, 1 is returned, + if there is an error, -1 is returned. If the length of the strings are + different, this function is O(1). '\0' characters are not treated in + any special way. + + .......................................................................... + + extern int biseqcstr (const_bstring b, const char *s); + + Compare the bstring b and char * bstring s. The C string s must be '\0' + terminated at exactly the length of the bstring b, and the contents + between the two must be identical with the bstring b with no '\0' + characters for the two contents to be considered equal. This is + equivalent to the condition that their current contents will be always be + equal when comparing them in the same format after converting one or the + other. If they are equal 1 is returned, if they are unequal 0 is + returned and if there is a detectable error BSTR_ERR is returned. + + .......................................................................... + + extern int biseqcstrcaseless (const_bstring b, const char *s); + + Compare the bstring b and char * string s. The C string s must be '\0' + terminated at exactly the length of the bstring b, and the contents + between the two must be identical except for case with the bstring b with + no '\0' characters for the two contents to be considered equal. This is + equivalent to the condition that their current contents will be always be + equal ignoring case when comparing them in the same format after + converting one or the other. If they are equal, except for case, 1 is + returned, if they are unequal regardless of case 0 is returned and if + there is a detectable error BSTR_ERR is returned. + + .......................................................................... + + extern int bstrcmp (const_bstring b0, const_bstring b1); + + Compare the bstrings b0 and b1 for ordering. If there is an error, + SHRT_MIN is returned, otherwise a value less than or greater than zero, + indicating that the bstring pointed to by b0 is lexicographically less + than or greater than the bstring pointed to by b1 is returned. If the + bstring lengths are unequal but the characters up until the length of the + shorter are equal then a value less than, or greater than zero, + indicating that the bstring pointed to by b0 is shorter or longer than the + bstring pointed to by b1 is returned. 0 is returned if and only if the + two bstrings are the same. If the length of the bstrings are different, + this function is O(n). Like its standard C library counter part, the + comparison does not proceed past any '\0' termination characters + encountered. + + The seemingly odd error return value, merely provides slightly more + granularity than the undefined situation given in the C library function + strcmp. The function otherwise behaves very much like strcmp(). + + Note that the semantics of bstrcmp are not completely compatible with + biseq because of its different treatment of the '\0' termination + character. + + .......................................................................... + + extern int bstrncmp (const_bstring b0, const_bstring b1, int n); + + Compare the bstrings b0 and b1 for ordering for at most n characters. If + there is an error, SHRT_MIN is returned, otherwise a value is returned as + if b0 and b1 were first truncated to at most n characters then bstrcmp + was called with these new bstrings are paremeters. If the length of the + bstrings are different, this function is O(n). Like its standard C + library counter part, the comparison does not proceed past any '\0' + termination characters encountered. + + The seemingly odd error return value, merely provides slightly more + granularity than the undefined situation given in the C library function + strncmp. The function otherwise behaves very much like strncmp(). + + .......................................................................... + + extern int bstricmp (const_bstring b0, const_bstring b1); + + Compare two bstrings without differentiating between case. The return + value is the difference of the values of the characters where the two + bstrings first differ, otherwise 0 is returned indicating that the + bstrings are equal. If the lengths are different, then a difference from + 0 is given, but if the first extra character is '\0', then it is taken to + be the value UCHAR_MAX+1. + + .......................................................................... + + extern int bstrnicmp (const_bstring b0, const_bstring b1, int n); + + Compare two bstrings without differentiating between case for at most n + characters. If the position where the two bstrings first differ is + before the nth position, the return value is the difference of the values + of the characters, otherwise 0 is returned. If the lengths are different + and less than n characters, then a difference from 0 is given, but if the + first extra character is '\0', then it is taken to be the value + UCHAR_MAX+1. + + .......................................................................... + + extern int bdestroy (bstring b); + + Deallocate the bstring passed. Passing NULL in as a parameter will have + no effect. Note that both the header and the data portion of the bstring + will be freed. No other bstring function which modifies one of its + parameters will free or reallocate the header. Because of this, in + general, bdestroy cannot be called on any declared struct tagbstring even + if it is not write protected. A bstring which is write protected cannot + be destroyed via the bdestroy call. Any attempt to do so will result in + no action taken, and BSTR_ERR will be returned. + + Note to C++ users: Passing in a CBString cast to a bstring will lead to + undefined behavior (free will be called on the header, rather than the + CBString destructor.) Instead just use the ordinary C++ language + facilities to dealloc a CBString. + + .......................................................................... + + extern int binstr (const_bstring s1, int pos, const_bstring s2); + + Search for the bstring s2 in s1 starting at position pos and looking in a + forward (increasing) direction. If it is found then it returns with the + first position after pos where it is found, otherwise it returns BSTR_ERR. + The algorithm used is brute force; O(m*n). + + .......................................................................... + + extern int binstrr (const_bstring s1, int pos, const_bstring s2); + + Search for the bstring s2 in s1 starting at position pos and looking in a + backward (decreasing) direction. If it is found then it returns with the + first position after pos where it is found, otherwise return BSTR_ERR. + Note that the current position at pos is tested as well -- so to be + disjoint from a previous forward search it is recommended that the + position be backed up (decremented) by one position. The algorithm used + is brute force; O(m*n). + + .......................................................................... + + extern int binstrcaseless (const_bstring s1, int pos, const_bstring s2); + + Search for the bstring s2 in s1 starting at position pos and looking in a + forward (increasing) direction but without regard to case. If it is + found then it returns with the first position after pos where it is + found, otherwise it returns BSTR_ERR. The algorithm used is brute force; + O(m*n). + + .......................................................................... + + extern int binstrrcaseless (const_bstring s1, int pos, const_bstring s2); + + Search for the bstring s2 in s1 starting at position pos and looking in a + backward (decreasing) direction but without regard to case. If it is + found then it returns with the first position after pos where it is + found, otherwise return BSTR_ERR. Note that the current position at pos + is tested as well -- so to be disjoint from a previous forward search it + is recommended that the position be backed up (decremented) by one + position. The algorithm used is brute force; O(m*n). + + .......................................................................... + + extern int binchr (const_bstring b0, int pos, const_bstring b1); + + Search for the first position in b0 starting from pos or after, in which + one of the characters in b1 is found. This function has an execution + time of O(b0->slen + b1->slen). If such a position does not exist in b0, + then BSTR_ERR is returned. + + .......................................................................... + + extern int binchrr (const_bstring b0, int pos, const_bstring b1); + + Search for the last position in b0 no greater than pos, in which one of + the characters in b1 is found. This function has an execution time + of O(b0->slen + b1->slen). If such a position does not exist in b0, + then BSTR_ERR is returned. + + .......................................................................... + + extern int bninchr (const_bstring b0, int pos, const_bstring b1); + + Search for the first position in b0 starting from pos or after, in which + none of the characters in b1 is found and return it. This function has + an execution time of O(b0->slen + b1->slen). If such a position does + not exist in b0, then BSTR_ERR is returned. + + .......................................................................... + + extern int bninchrr (const_bstring b0, int pos, const_bstring b1); + + Search for the last position in b0 no greater than pos, in which none of + the characters in b1 is found and return it. This function has an + execution time of O(b0->slen + b1->slen). If such a position does not + exist in b0, then BSTR_ERR is returned. + + .......................................................................... + + extern int bstrchr (const_bstring b, int c); + + Search for the character c in the bstring b forwards from the start of + the bstring. Returns the position of the found character or BSTR_ERR if + it is not found. + + NOTE: This has been implemented as a macro on top of bstrchrp (). + + .......................................................................... + + extern int bstrrchr (const_bstring b, int c); + + Search for the character c in the bstring b backwards from the end of the + bstring. Returns the position of the found character or BSTR_ERR if it is + not found. + + NOTE: This has been implemented as a macro on top of bstrrchrp (). + + .......................................................................... + + extern int bstrchrp (const_bstring b, int c, int pos); + + Search for the character c in b forwards from the position pos + (inclusive). Returns the position of the found character or BSTR_ERR if + it is not found. + + .......................................................................... + + extern int bstrrchrp (const_bstring b, int c, int pos); + + Search for the character c in b backwards from the position pos in bstring + (inclusive). Returns the position of the found character or BSTR_ERR if + it is not found. + + .......................................................................... + + extern int bsetstr (bstring b0, int pos, const_bstring b1, unsigned char fill); + + Overwrite the bstring b0 starting at position pos with the bstring b1. If + the position pos is past the end of b0, then the character "fill" is + appended as necessary to make up the gap between the end of b0 and pos. + If b1 is NULL, it behaves as if it were a 0-length bstring. The value + BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is + returned. + + .......................................................................... + + extern int binsert (bstring s1, int pos, const_bstring s2, unsigned char fill); + + Inserts the bstring s2 into s1 at position pos. If the position pos is + past the end of s1, then the character "fill" is appended as necessary to + make up the gap between the end of s1 and pos. The value BSTR_OK is + returned if the operation is successful, otherwise BSTR_ERR is returned. + + .......................................................................... + + int binsertblk (bstring b, int pos, const void * blk, int len, + unsigned char fill) + + Inserts the block of characters at blk with length len into b at position + pos. If the position pos is past the end of b, then the character "fill" + is appended as necessary to make up the gap between the end of b1 and pos. + Unlike bsetstr, binsert does not allow b2 to be NULL. + + .......................................................................... + + extern int binsertch (bstring s1, int pos, int len, unsigned char fill); + + Inserts the character fill repeatedly into s1 at position pos for a + length len. If the position pos is past the end of s1, then the + character "fill" is appended as necessary to make up the gap between the + end of s1 and the position pos + len (exclusive). The value BSTR_OK is + returned if the operation is successful, otherwise BSTR_ERR is returned. + + .......................................................................... + + extern int breplace (bstring b1, int pos, int len, const_bstring b2, + unsigned char fill); + + Replace a section of a bstring from pos for a length len with the bstring + b2. If the position pos is past the end of b1 then the character "fill" + is appended as necessary to make up the gap between the end of b1 and + pos. + + .......................................................................... + + extern int bfindreplace (bstring b, const_bstring find, + const_bstring replace, int position); + + Replace all occurrences of the find substring with a replace bstring + after a given position in the bstring b. The find bstring must have a + length > 0 otherwise BSTR_ERR is returned. This function does not + perform recursive per character replacement; that is to say successive + searches resume at the position after the last replace. + + So for example: + + bfindreplace (a0 = bfromcstr("aabaAb"), a1 = bfromcstr("a"), + a2 = bfromcstr("aa"), 0); + + Should result in changing a0 to "aaaabaaAb". + + This function performs exactly (b->slen - position) bstring comparisons, + and data movement is bounded above by character volume equivalent to size + of the output bstring. + + .......................................................................... + + extern int bfindreplacecaseless (bstring b, const_bstring find, + const_bstring replace, int position); + + Replace all occurrences of the find substring, ignoring case, with a + replace bstring after a given position in the bstring b. The find bstring + must have a length > 0 otherwise BSTR_ERR is returned. This function + does not perform recursive per character replacement; that is to say + successive searches resume at the position after the last replace. + + So for example: + + bfindreplacecaseless (a0 = bfromcstr("AAbaAb"), a1 = bfromcstr("a"), + a2 = bfromcstr("aa"), 0); + + Should result in changing a0 to "aaaabaaaab". + + This function performs exactly (b->slen - position) bstring comparisons, + and data movement is bounded above by character volume equivalent to size + of the output bstring. + + .......................................................................... + + extern int balloc (bstring b, int length); + + Increase the allocated memory backing the data buffer for the bstring b + to a length of at least length. If the memory backing the bstring b is + already large enough, not action is performed. This has no effect on the + bstring b that is visible to the bstring API. Usually this function will + only be used when a minimum buffer size is required coupled with a direct + access to the ->data member of the bstring structure. + + Be warned that like any other bstring function, the bstring must be well + defined upon entry to this function. I.e., doing something like: + + b->slen *= 2; /* ?? Most likely incorrect */ + balloc (b, b->slen); + + is invalid, and should be implemented as: + + int t; + if (BSTR_OK == balloc (b, t = (b->slen * 2))) b->slen = t; + + This function will return with BSTR_ERR if b is not detected as a valid + bstring or length is not greater than 0, otherwise BSTR_OK is returned. + + .......................................................................... + + extern int ballocmin (bstring b, int length); + + Change the amount of memory backing the bstring b to at least length. + This operation will never truncate the bstring data including the + extra terminating '\0' and thus will not decrease the length to less than + b->slen + 1. Note that repeated use of this function may cause + performance problems (realloc may be called on the bstring more than + the O(log(INT_MAX)) times). This function will return with BSTR_ERR if b + is not detected as a valid bstring or length is not greater than 0, + otherwise BSTR_OK is returned. + + So for example: + + if (BSTR_OK == ballocmin (b, 64)) b->data[63] = 'x'; + + The idea is that this will set the 64th character of b to 'x' if it is at + least 64 characters long otherwise do nothing. And we know this is well + defined so long as the ballocmin call was successfully, since it will + ensure that b has been allocated with at least 64 characters. + + .......................................................................... + + int btrunc (bstring b, int n); + + Truncate the bstring to at most n characters. This function will return + with BSTR_ERR if b is not detected as a valid bstring or n is less than + 0, otherwise BSTR_OK is returned. + + .......................................................................... + + extern int bpattern (bstring b, int len); + + Replicate the starting bstring, b, end to end repeatedly until it + surpasses len characters, then chop the result to exactly len characters. + This function operates in-place. This function will return with BSTR_ERR + if b is NULL or of length 0, otherwise BSTR_OK is returned. + + .......................................................................... + + extern int btoupper (bstring b); + + Convert contents of bstring to upper case. This function will return with + BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK is returned. + + .......................................................................... + + extern int btolower (bstring b); + + Convert contents of bstring to lower case. This function will return with + BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK is returned. + + .......................................................................... + + extern int bltrimws (bstring b); + + Delete whitespace contiguous from the left end of the bstring. This + function will return with BSTR_ERR if b is NULL or of length 0, otherwise + BSTR_OK is returned. + + .......................................................................... + + extern int brtrimws (bstring b); + + Delete whitespace contiguous from the right end of the bstring. This + function will return with BSTR_ERR if b is NULL or of length 0, otherwise + BSTR_OK is returned. + + .......................................................................... + + extern int btrimws (bstring b); + + Delete whitespace contiguous from both ends of the bstring. This function + will return with BSTR_ERR if b is NULL or of length 0, otherwise BSTR_OK + is returned. + + .......................................................................... + + extern struct bstrList* bstrListCreate (void); + + Create an empty struct bstrList. The struct bstrList output structure is + declared as follows: + + struct bstrList { + int qty, mlen; + bstring * entry; + }; + + The entry field actually is an array with qty number entries. The mlen + record counts the maximum number of bstring's for which there is memory + in the entry record. + + The Bstrlib API does *NOT* include a comprehensive set of functions for + full management of struct bstrList in an abstracted way. The reason for + this is because aliasing semantics of the list are best left to the user + of this function, and performance varies wildly depending on the + assumptions made. For a complete list of bstring data type it is + recommended that the C++ public std::vector<CBString> be used, since its + semantics and usage are more standard. + + .......................................................................... + + extern int bstrListDestroy (struct bstrList * sl); + + Destroy a struct bstrList structure that was returned by the bsplit + function. Note that this will destroy each bstring in the ->entry array + as well. See bstrListCreate() above for structure of struct bstrList. + + .......................................................................... + + extern int bstrListAlloc (struct bstrList * sl, int msz); + + Ensure that there is memory for at least msz number of entries for the + list. + + .......................................................................... + + extern int bstrListAllocMin (struct bstrList * sl, int msz); + + Try to allocate the minimum amount of memory for the list to include at + least msz entries or sl->qty whichever is greater. + + .......................................................................... + + extern struct bstrList * bsplit (bstring str, unsigned char splitChar); + + Create an array of sequential substrings from str divided by the + character splitChar. Successive occurrences of the splitChar will be + divided by empty bstring entries, following the semantics from the Python + programming language. To reclaim the memory from this output structure, + bstrListDestroy () should be called. See bstrListCreate() above for + structure of struct bstrList. + + .......................................................................... + + extern struct bstrList * bsplits (bstring str, const_bstring splitStr); + + Create an array of sequential substrings from str divided by any + character contained in splitStr. An empty splitStr causes a single entry + bstrList containing a copy of str to be returned. See bstrListCreate() + above for structure of struct bstrList. + + .......................................................................... + + extern struct bstrList * bsplitstr (bstring str, const_bstring splitStr); + + Create an array of sequential substrings from str divided by the entire + substring splitStr. An empty splitStr causes a single entry bstrList + containing a copy of str to be returned. See bstrListCreate() above for + structure of struct bstrList. + + .......................................................................... + + extern bstring bjoin (const struct bstrList * bl, const_bstring sep); + + Join the entries of a bstrList into one bstring by sequentially + concatenating them with the sep bstring in between. If sep is NULL, it + is treated as if it were the empty bstring. Note that: + + bjoin (l = bsplit (b, s->data[0]), s); + + should result in a copy of b, if s->slen is 1. If there is an error NULL + is returned, otherwise a bstring with the correct result is returned. + See bstrListCreate() above for structure of struct bstrList. + + .......................................................................... + + bstring bjoinblk (const struct bstrList * bl, void * blk, int len); + + Join the entries of a bstrList into one bstring by sequentially + concatenating them with the content from blk for length len in between. + If there is an error NULL is returned, otherwise a bstring with the + correct result is returned. + + .......................................................................... + + extern int bsplitcb (const_bstring str, unsigned char splitChar, int pos, + int (* cb) (void * parm, int ofs, int len), void * parm); + + Iterate the set of disjoint sequential substrings over str starting at + position pos divided by the character splitChar. The parm passed to + bsplitcb is passed on to cb. If the function cb returns a value < 0, + then further iterating is halted and this value is returned by bsplitcb. + + Note: Non-destructive modification of str from within the cb function + while performing this split is not undefined. bsplitcb behaves in + sequential lock step with calls to cb. I.e., after returning from a cb + that return a non-negative integer, bsplitcb continues from the position + 1 character after the last detected split character and it will halt + immediately if the length of str falls below this point. However, if the + cb function destroys str, then it *must* return with a negative value, + otherwise bsplitcb will continue in an undefined manner. + + This function is provided as an incremental alternative to bsplit that is + abortable and which does not impose additional memory allocation. + + .......................................................................... + + extern int bsplitscb (const_bstring str, const_bstring splitStr, int pos, + int (* cb) (void * parm, int ofs, int len), void * parm); + + Iterate the set of disjoint sequential substrings over str starting at + position pos divided by any of the characters in splitStr. An empty + splitStr causes the whole str to be iterated once. The parm passed to + bsplitcb is passed on to cb. If the function cb returns a value < 0, + then further iterating is halted and this value is returned by bsplitcb. + + Note: Non-destructive modification of str from within the cb function + while performing this split is not undefined. bsplitscb behaves in + sequential lock step with calls to cb. I.e., after returning from a cb + that return a non-negative integer, bsplitscb continues from the position + 1 character after the last detected split character and it will halt + immediately if the length of str falls below this point. However, if the + cb function destroys str, then it *must* return with a negative value, + otherwise bsplitscb will continue in an undefined manner. + + This function is provided as an incremental alternative to bsplits that + is abortable and which does not impose additional memory allocation. + + .......................................................................... + + extern int bsplitstrcb (const_bstring str, const_bstring splitStr, int pos, + int (* cb) (void * parm, int ofs, int len), void * parm); + + Iterate the set of disjoint sequential substrings over str starting at + position pos divided by the entire substring splitStr. An empty splitStr + causes each character of str to be iterated. The parm passed to bsplitcb + is passed on to cb. If the function cb returns a value < 0, then further + iterating is halted and this value is returned by bsplitcb. + + Note: Non-destructive modification of str from within the cb function + while performing this split is not undefined. bsplitstrcb behaves in + sequential lock step with calls to cb. I.e., after returning from a cb + that return a non-negative integer, bsplitstrcb continues from the position + 1 character after the last detected split character and it will halt + immediately if the length of str falls below this point. However, if the + cb function destroys str, then it *must* return with a negative value, + otherwise bsplitscb will continue in an undefined manner. + + This function is provided as an incremental alternative to bsplitstr that + is abortable and which does not impose additional memory allocation. + + .......................................................................... + + extern bstring bformat (const char * fmt, ...); + + Takes the same parameters as printf (), but rather than outputting + results to stdio, it forms a bstring which contains what would have been + output. Note that if there is an early generation of a '\0' character, + the bstring will be truncated to this end point. + + Note that %s format tokens correspond to '\0' terminated char * buffers, + not bstrings. To print a bstring, first dereference data element of the + the bstring: + + /* b1->data needs to be '\0' terminated, so tagbstrings generated + by blk2tbstr () might not be suitable. */ + b0 = bformat ("Hello, %s", b1->data); + + Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been + compiled the bformat function is not present. + + .......................................................................... + + extern int bformata (bstring b, const char * fmt, ...); + + In addition to the initial output buffer b, bformata takes the same + parameters as printf (), but rather than outputting results to stdio, it + appends the results to the initial bstring parameter. Note that if + there is an early generation of a '\0' character, the bstring will be + truncated to this end point. + + Note that %s format tokens correspond to '\0' terminated char * buffers, + not bstrings. To print a bstring, first dereference data element of the + the bstring: + + /* b1->data needs to be '\0' terminated, so tagbstrings generated + by blk2tbstr () might not be suitable. */ + bformata (b0 = bfromcstr ("Hello"), ", %s", b1->data); + + Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been + compiled the bformata function is not present. + + .......................................................................... + + extern int bassignformat (bstring b, const char * fmt, ...); + + After the first parameter, it takes the same parameters as printf (), but + rather than outputting results to stdio, it outputs the results to + the bstring parameter b. Note that if there is an early generation of a + '\0' character, the bstring will be truncated to this end point. + + Note that %s format tokens correspond to '\0' terminated char * buffers, + not bstrings. To print a bstring, first dereference data element of the + the bstring: + + /* b1->data needs to be '\0' terminated, so tagbstrings generated + by blk2tbstr () might not be suitable. */ + bassignformat (b0 = bfromcstr ("Hello"), ", %s", b1->data); + + Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been + compiled the bassignformat function is not present. + + .......................................................................... + + extern int bvcformata (bstring b, int count, const char * fmt, va_list arglist); + + The bvcformata function formats data under control of the format control + string fmt and attempts to append the result to b. The fmt parameter is + the same as that of the printf function. The variable argument list is + replaced with arglist, which has been initialized by the va_start macro. + The size of the output is upper bounded by count. If the required output + exceeds count, the string b is not augmented with any contents and a value + below BSTR_ERR is returned. If a value below -count is returned then it + is recommended that the negative of this value be used as an update to the + count in a subsequent pass. On other errors, such as running out of + memory, parameter errors or numeric wrap around BSTR_ERR is returned. + BSTR_OK is returned when the output is successfully generated and + appended to b. + + Note: There is no sanity checking of arglist, and this function is + destructive of the contents of b from the b->slen point onward. If there + is an early generation of a '\0' character, the bstring will be truncated + to this end point. + + Although this function is part of the external API for Bstrlib, the + interface and semantics (length limitations, and unusual return codes) + are fairly atypical. The real purpose for this function is to provide an + engine for the bvformata macro. + + Note that if the BSTRLIB_NOVSNP macro has been set when bstrlib has been + compiled the bvcformata function is not present. + + .......................................................................... + + extern bstring bread (bNread readPtr, void * parm); + typedef size_t (* bNread) (void *buff, size_t elsize, size_t nelem, + void *parm); + + Read an entire stream into a bstring, verbatum. The readPtr function + pointer is compatible with fread sematics, except that it need not obtain + the stream data from a file. The intention is that parm would contain + the stream data context/state required (similar to the role of the FILE* + I/O stream parameter of fread.) + + Abstracting the block read function allows for block devices other than + file streams to be read if desired. Note that there is an ANSI + compatibility issue if "fread" is used directly; see the ANSI issues + section below. + + .......................................................................... + + extern int breada (bstring b, bNread readPtr, void * parm); + + Read an entire stream and append it to a bstring, verbatum. Behaves + like bread, except that it appends it results to the bstring b. + BSTR_ERR is returned on error, otherwise 0 is returned. + + .......................................................................... + + extern bstring bgets (bNgetc getcPtr, void * parm, char terminator); + typedef int (* bNgetc) (void * parm); + + Read a bstring from a stream. As many bytes as is necessary are read + until the terminator is consumed or no more characters are available from + the stream. If read from the stream, the terminator character will be + appended to the end of the returned bstring. The getcPtr function must + have the same semantics as the fgetc C library function (i.e., returning + an integer whose value is negative when there are no more characters + available, otherwise the value of the next available unsigned character + from the stream.) The intention is that parm would contain the stream + data context/state required (similar to the role of the FILE* I/O stream + parameter of fgets.) If no characters are read, or there is some other + detectable error, NULL is returned. + + bgets will never call the getcPtr function more often than necessary to + construct its output (including a single call, if required, to determine + that the stream contains no more characters.) + + Abstracting the character stream function and terminator character allows + for different stream devices and string formats other than '\n' + terminated lines in a file if desired (consider \032 terminated email + messages, in a UNIX mailbox for example.) + + For files, this function can be used analogously as fgets as follows: + + fp = fopen ( ... ); + if (fp) b = bgets ((bNgetc) fgetc, fp, '\n'); + + (Note that only one terminator character can be used, and that '\0' is + not assumed to terminate the stream in addition to the terminator + character. This is consistent with the semantics of fgets.) + + .......................................................................... + + extern int bgetsa (bstring b, bNgetc getcPtr, void * parm, char terminator); + + Read from a stream and concatenate to a bstring. Behaves like bgets, + except that it appends it results to the bstring b. The value 1 is + returned if no characters are read before a negative result is returned + from getcPtr. Otherwise BSTR_ERR is returned on error, and 0 is returned + in other normal cases. + + .......................................................................... + + extern int bassigngets (bstring b, bNgetc getcPtr, void * parm, char terminator); + + Read from a stream and concatenate to a bstring. Behaves like bgets, + except that it assigns the results to the bstring b. The value 1 is + returned if no characters are read before a negative result is returned + from getcPtr. Otherwise BSTR_ERR is returned on error, and 0 is returned + in other normal cases. + + .......................................................................... + + extern struct bStream * bsopen (bNread readPtr, void * parm); + + Wrap a given open stream (described by a fread compatible function + pointer and stream handle) into an open bStream suitable for the bstring + library streaming functions. + + .......................................................................... + + extern void * bsclose (struct bStream * s); + + Close the bStream, and return the handle to the stream that was + originally used to open the given stream. If s is NULL or detectably + invalid, NULL will be returned. + + .......................................................................... + + extern int bsbufflength (struct bStream * s, int sz); + + Set the length of the buffer used by the bStream. If sz is the macro + BSTR_BS_BUFF_LENGTH_GET (which is 0), the length is not set. If s is + NULL or sz is negative, the function will return with BSTR_ERR, otherwise + this function returns with the previous length. + + .......................................................................... + + extern int bsreadln (bstring r, struct bStream * s, char terminator); + + Read a bstring terminated by the terminator character or the end of the + stream from the bStream (s) and return it into the parameter r. The + matched terminator, if found, appears at the end of the line read. If + the stream has been exhausted of all available data, before any can be + read, BSTR_ERR is returned. This function may read additional characters + into the stream buffer from the core stream that are not returned, but + will be retained for subsequent read operations. When reading from high + speed streams, this function can perform significantly faster than bgets. + + .......................................................................... + + extern int bsreadlna (bstring r, struct bStream * s, char terminator); + + Read a bstring terminated by the terminator character or the end of the + stream from the bStream (s) and concatenate it to the parameter r. The + matched terminator, if found, appears at the end of the line read. If + the stream has been exhausted of all available data, before any can be + read, BSTR_ERR is returned. This function may read additional characters + into the stream buffer from the core stream that are not returned, but + will be retained for subsequent read operations. When reading from high + speed streams, this function can perform significantly faster than bgets. + + .......................................................................... + + extern int bsreadlns (bstring r, struct bStream * s, bstring terminators); + + Read a bstring terminated by any character in the terminators bstring or + the end of the stream from the bStream (s) and return it into the + parameter r. This function may read additional characters from the core + stream that are not returned, but will be retained for subsequent read + operations. + + .......................................................................... + + extern int bsreadlnsa (bstring r, struct bStream * s, bstring terminators); + + Read a bstring terminated by any character in the terminators bstring or + the end of the stream from the bStream (s) and concatenate it to the + parameter r. If the stream has been exhausted of all available data, + before any can be read, BSTR_ERR is returned. This function may read + additional characters from the core stream that are not returned, but + will be retained for subsequent read operations. + + .......................................................................... + + extern int bsread (bstring r, struct bStream * s, int n); + + Read a bstring of length n (or, if it is fewer, as many bytes as is + remaining) from the bStream. This function will read the minimum + required number of additional characters from the core stream. When the + stream is at the end of the file BSTR_ERR is returned, otherwise BSTR_OK + is returned. + + .......................................................................... + + extern int bsreada (bstring r, struct bStream * s, int n); + + Read a bstring of length n (or, if it is fewer, as many bytes as is + remaining) from the bStream and concatenate it to the parameter r. This + function will read the minimum required number of additional characters + from the core stream. When the stream is at the end of the file BSTR_ERR + is returned, otherwise BSTR_OK is returned. + + .......................................................................... + + extern int bsunread (struct bStream * s, const_bstring b); + + Insert a bstring into the bStream at the current position. These + characters will be read prior to those that actually come from the core + stream. + + .......................................................................... + + extern int bspeek (bstring r, const struct bStream * s); + + Return the number of currently buffered characters from the bStream that + will be read prior to reads from the core stream, and append it to the + the parameter r. + + .......................................................................... + + extern int bssplitscb (struct bStream * s, const_bstring splitStr, + int (* cb) (void * parm, int ofs, const_bstring entry), void * parm); + + Iterate the set of disjoint sequential substrings over the stream s + divided by any character from the bstring splitStr. The parm passed to + bssplitscb is passed on to cb. If the function cb returns a value < 0, + then further iterating is halted and this return value is returned by + bssplitscb. + + Note: At the point of calling the cb function, the bStream pointer is + pointed exactly at the position right after having read the split + character. The cb function can act on the stream by causing the bStream + pointer to move, and bssplitscb will continue by starting the next split + at the position of the pointer after the return from cb. + + However, if the cb causes the bStream s to be destroyed then the cb must + return with a negative value, otherwise bssplitscb will continue in an + undefined manner. + + This function is provided as way to incrementally parse through a file + or other generic stream that in total size may otherwise exceed the + practical or desired memory available. As with the other split callback + based functions this is abortable and does not impose additional memory + allocation. + + .......................................................................... + + extern int bssplitstrcb (struct bStream * s, const_bstring splitStr, + int (* cb) (void * parm, int ofs, const_bstring entry), void * parm); + + Iterate the set of disjoint sequential substrings over the stream s + divided by the entire substring splitStr. The parm passed to + bssplitstrcb is passed on to cb. If the function cb returns a + value < 0, then further iterating is halted and this return value is + returned by bssplitstrcb. + + Note: At the point of calling the cb function, the bStream pointer is + pointed exactly at the position right after having read the split + character. The cb function can act on the stream by causing the bStream + pointer to move, and bssplitstrcb will continue by starting the next + split at the position of the pointer after the return from cb. + + However, if the cb causes the bStream s to be destroyed then the cb must + return with a negative value, otherwise bssplitscb will continue in an + undefined manner. + + This function is provided as way to incrementally parse through a file + or other generic stream that in total size may otherwise exceed the + practical or desired memory available. As with the other split callback + based functions this is abortable and does not impose additional memory + allocation. + + .......................................................................... + + extern int bseof (const struct bStream * s); + + Return the defacto "EOF" (end of file) state of a stream (1 if the + bStream is in an EOF state, 0 if not, and BSTR_ERR if stream is closed or + detectably erroneous.) When the readPtr callback returns a value <= 0 + the stream reaches its "EOF" state. Note that bunread with non-empty + content will essentially turn off this state, and the stream will not be + in its "EOF" state so long as its possible to read more data out of it. + + Also note that the semantics of bseof() are slightly different from + something like feof(). I.e., reaching the end of the stream does not + necessarily guarantee that bseof() will return with a value indicating + that this has happened. bseof() will only return indicating that it has + reached the "EOF" and an attempt has been made to read past the end of + the bStream. + +The macros +---------- + + The macros described below are shown in a prototype form indicating their + intended usage. Note that the parameters passed to these macros will be + referenced multiple times. As with all macros, programmer care is + required to guard against unintended side effects. + + int blengthe (const_bstring b, int err); + + Returns the length of the bstring. If the bstring is NULL err is + returned. + + .......................................................................... + + int blength (const_bstring b); + + Returns the length of the bstring. If the bstring is NULL, the length + returned is 0. + + .......................................................................... + + int bchare (const_bstring b, int p, int c); + + Returns the p'th character of the bstring b. If the position p refers to + a position that does not exist in the bstring or the bstring is NULL, + then c is returned. + + .......................................................................... + + char bchar (const_bstring b, int p); + + Returns the p'th character of the bstring b. If the position p refers to + a position that does not exist in the bstring or the bstring is NULL, + then '\0' is returned. + + .......................................................................... + + char * bdatae (bstring b, char * err); + + Returns the char * data portion of the bstring b. If b is NULL, err is + returned. + + .......................................................................... + + char * bdata (bstring b); + + Returns the char * data portion of the bstring b. If b is NULL, NULL is + returned. + + .......................................................................... + + char * bdataofse (bstring b, int ofs, char * err); + + Returns the char * data portion of the bstring b offset by ofs. If b is + NULL, err is returned. + + .......................................................................... + + char * bdataofs (bstring b, int ofs); + + Returns the char * data portion of the bstring b offset by ofs. If b is + NULL, NULL is returned. + + .......................................................................... + + struct tagbstring var = bsStatic ("..."); + + The bsStatic macro allows for static declarations of literal string + constants as struct tagbstring structures. The resulting tagbstring does + not need to be freed or destroyed. Note that this macro is only well + defined for string literal arguments. For more general string pointers, + use the btfromcstr macro. + + The resulting struct tagbstring is permanently write protected. Attempts + to write to this struct tagbstring from any bstrlib function will lead to + BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct + tagbstring has no effect. + + .......................................................................... + + <void * blk, int len> <- bsStaticBlkParms ("...") + + The bsStaticBlkParms macro emits a pair of comma seperated parameters + corresponding to the block parameters for the block functions in Bstrlib + (i.e., blk2bstr, bcatblk, blk2tbstr, bisstemeqblk, bisstemeqcaselessblk.) + Note that this macro is only well defined for string literal arguments. + + Examples: + + bstring b = blk2bstr (bsStaticBlkParms ("Fast init. ")); + bcatblk (b, bsStaticBlkParms ("No frills fast concatenation.")); + + These are faster than using bfromcstr() and bcatcstr() respectively + because the length of the inline string is known as a compile time + constant. Also note that seperate struct tagbstring declarations for + holding the output of a bsStatic() macro are not required. + + .......................................................................... + + void btfromcstr (struct tagbstring& t, const char * s); + + Fill in the tagbstring t with the '\0' terminated char buffer s. This + action is purely reference oriented; no memory management is done. The + data member is just assigned s, and slen is assigned the strlen of s. + The s parameter is accessed exactly once in this macro. + + The resulting struct tagbstring is initially write protected. Attempts + to write to this struct tagbstring in a write protected state from any + bstrlib function will lead to BSTR_ERR being returned. Invoke the + bwriteallow on this struct tagbstring to make it writeable (though this + requires that s be obtained from a function compatible with malloc.) + + .......................................................................... + + void btfromblk (struct tagbstring& t, void * s, int len); + + Fill in the tagbstring t with the data buffer s with length len. This + action is purely reference oriented; no memory management is done. The + data member of t is just assigned s, and slen is assigned len. Note that + the buffer is not appended with a '\0' character. The s and len + parameters are accessed exactly once each in this macro. + + The resulting struct tagbstring is initially write protected. Attempts + to write to this struct tagbstring in a write protected state from any + bstrlib function will lead to BSTR_ERR being returned. Invoke the + bwriteallow on this struct tagbstring to make it writeable (though this + requires that s be obtained from a function compatible with malloc.) + + .......................................................................... + + void btfromblkltrimws (struct tagbstring& t, void * s, int len); + + Fill in the tagbstring t with the data buffer s with length len after it + has been left trimmed. This action is purely reference oriented; no + memory management is done. The data member of t is just assigned to a + pointer inside the buffer s. Note that the buffer is not appended with a + '\0' character. The s and len parameters are accessed exactly once each + in this macro. + + The resulting struct tagbstring is permanently write protected. Attempts + to write to this struct tagbstring from any bstrlib function will lead to + BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct + tagbstring has no effect. + + .......................................................................... + + void btfromblkrtrimws (struct tagbstring& t, void * s, int len); + + Fill in the tagbstring t with the data buffer s with length len after it + has been right trimmed. This action is purely reference oriented; no + memory management is done. The data member of t is just assigned to a + pointer inside the buffer s. Note that the buffer is not appended with a + '\0' character. The s and len parameters are accessed exactly once each + in this macro. + + The resulting struct tagbstring is permanently write protected. Attempts + to write to this struct tagbstring from any bstrlib function will lead to + BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct + tagbstring has no effect. + + .......................................................................... + + void btfromblktrimws (struct tagbstring& t, void * s, int len); + + Fill in the tagbstring t with the data buffer s with length len after it + has been left and right trimmed. This action is purely reference + oriented; no memory management is done. The data member of t is just + assigned to a pointer inside the buffer s. Note that the buffer is not + appended with a '\0' character. The s and len parameters are accessed + exactly once each in this macro. + + The resulting struct tagbstring is permanently write protected. Attempts + to write to this struct tagbstring from any bstrlib function will lead to + BSTR_ERR being returned. Invoking the bwriteallow macro onto this struct + tagbstring has no effect. + + .......................................................................... + + void bmid2tbstr (struct tagbstring& t, bstring b, int pos, int len); + + Fill the tagbstring t with the substring from b, starting from position + pos with a length len. The segment is clamped by the boundaries of + the bstring b. This action is purely reference oriented; no memory + management is done. Note that the buffer is not appended with a '\0' + character. Note that the t parameter to this macro may be accessed + multiple times. Note that the contents of t will become undefined + if the contents of b change or are destroyed. + + The resulting struct tagbstring is permanently write protected. Attempts + to write to this struct tagbstring in a write protected state from any + bstrlib function will lead to BSTR_ERR being returned. Invoking the + bwriteallow macro on this struct tagbstring will have no effect. + + .......................................................................... + + bstring bfromStatic("..."); + + Allocate a bstring with the contents of a string literal. Returns + NULL if an error has occurred (ran out of memory). The string literal + parameter is enforced as literal at compile time. + + .......................................................................... + + int bcatStatic (bstring b, "..."); + + Append a string literal to bstring b. Returns 0 if successful, or + BSTR_ERR if some error has occurred. The string literal parameter is + enforced as literal at compile time. + + .......................................................................... + + int binsertStatic (bstring s1, int pos, " ... ", char fill); + + Inserts the string literal into s1 at position pos. If the position pos + is past the end of s1, then the character "fill" is appended as necessary + to make up the gap between the end of s1 and pos. The value BSTR_OK is + returned if the operation is successful, otherwise BSTR_ERR is returned. + + .......................................................................... + + int bassignStatic (bstring b, " ... "); + + Assign the contents of a string literal to the bstring b. The string + literal parameter is enforced as literal at compile time. + + .......................................................................... + + int biseqStatic (const_bstring b, " ... "); + + Compare the string b with the string literal. If the content differs, 0 + is returned, if the content is the same, 1 is returned, if there is an + error, -1 is returned. If the length of the strings are different, this + function is O(1). '\0' characters are not treated in any special way. + + .......................................................................... + + int biseqcaselessStatic (const_bstring b, " ... "); + + Compare content of b and the string literal for equality without + differentiating between character case. If the content differs other + than in case, 0 is returned, if, ignoring case, the content is the same, + 1 is returned, if there is an error, -1 is returned. If the length of + the strings are different, this function is O(1). '\0' characters are + not treated in any special way. + + .......................................................................... + + int bisstemeqStatic (bstring b, " ... "); + + Compare beginning of bstring b with a string literal for equality. If + the beginning of b differs from the memory block (or if b is too short), + 0 is returned, if the bstrings are the same, 1 is returned, if there is + an error, -1 is returned. The string literal parameter is enforced as + literal at compile time. + + .......................................................................... + + int bisstemeqcaselessStatic (bstring b, " ... "); + + Compare beginning of bstring b with a string literal without + differentiating between case for equality. If the beginning of b differs + from the memory block other than in case (or if b is too short), 0 is + returned, if the bstrings are the same, 1 is returned, if there is an + error, -1 is returned. The string literal parameter is enforced as + literal at compile time. + + .......................................................................... + + bstring bjoinStatic (const struct bstrList * bl, " ... "); + + Join the entries of a bstrList into one bstring by sequentially + concatenating them with the string literal in between. If there is an + error NULL is returned, otherwise a bstring with the correct result is + returned. See bstrListCreate() above for structure of struct bstrList. + + .......................................................................... + + void bvformata (int& ret, bstring b, const char * format, lastarg); + + Append the bstring b with printf like formatting with the format control + string, and the arguments taken from the ... list of arguments after + lastarg passed to the containing function. If the containing function + does not have ... parameters or lastarg is not the last named parameter + before the ... then the results are undefined. If successful, the + results are appended to b and BSTR_OK is assigned to ret. Otherwise + BSTR_ERR is assigned to ret. + + Example: + + void dbgerror (FILE * fp, const char * fmt, ...) { + int ret; + bstring b; + bvformata (ret, b = bfromcstr ("DBG: "), fmt, fmt); + if (BSTR_OK == ret) fputs ((char *) bdata (b), fp); + bdestroy (b); + } + + Note that if the BSTRLIB_NOVSNP macro was set when bstrlib had been + compiled the bvformata macro will not link properly. If the + BSTRLIB_NOVSNP macro has been set, the bvformata macro will not be + available. + + .......................................................................... + + void bwriteprotect (struct tagbstring& t); + + Disallow bstring from being written to via the bstrlib API. Attempts to + write to the resulting tagbstring from any bstrlib function will lead to + BSTR_ERR being returned. + + Note: bstrings which are write protected cannot be destroyed via bdestroy. + + Note to C++ users: Setting a CBString as write protected will not prevent + it from being destroyed by the destructor. + + .......................................................................... + + void bwriteallow (struct tagbstring& t); + + Allow bstring to be written to via the bstrlib API. Note that such an + action makes the bstring both writable and destroyable. If the bstring is + not legitimately writable (as is the case for struct tagbstrings + initialized with a bsStatic value), the results of this are undefined. + + Note that invoking the bwriteallow macro may increase the number of + reallocs by one more than necessary for every call to bwriteallow + interleaved with any bstring API which writes to this bstring. + + .......................................................................... + + int biswriteprotected (struct tagbstring& t); + + Returns 1 if the bstring is write protected, otherwise 0 is returned. + +=============================================================================== + +Unicode functions +----------------- + + The two modules utf8util.c and buniutil.c implement basic functions for + parsing and collecting Unicode data in the UTF8 format. Unicode is + described by a sequence of "code points" which are values between 0 and + 1114111 inclusive mapped to symbol content corresponding to nearly all + the standardized scripts of the world. + + The semantics of Unicode code points is varied and complicated. The + base support of the better string library does not attempt to perform + any interpretation of these code points. The better string library + solely provides support for iterating through unicode code points, + appending and extracting code points to and from bstrings, and parsing + UTF8 and UTF16 from raw data. + + The types cpUcs4 and cpUcs2 respectively are defined as 4 byte and 2 byte + encoding formats corresponding to UCS4 and UCS2 respectively. To test + if a raw code point is valid, the macro isLegalUnicodeCodePoint() has + been defined. The utf8 iterator is defined by struct utf8Iterator. To + test if the iterator has more code points to walk through the macro + utf8IteratorNoMore() has been defined. + + To use these functions compile and link utf8util.c and buniutil.c + + .......................................................................... + + extern void utf8IteratorInit (struct utf8Iterator* iter, + unsigned char* data, int slen); + + Initialize a unicode utf8 iterator to traverse an array of utf8 encoded + code points pointed to by data, with length slen from the start. The + iterator iter is only valid for as long as the array it is pointed to + is valid and not modified. + + .......................................................................... + + extern void utf8IteratorUninit (struct utf8Iterator* iter); + + Invalidate utf8 iterator. After calling this the iterator iter, should + yield false when passed to the utf8IteratorNoMore() macro. + + .......................................................................... + + extern cpUcs4 utf8IteratorGetNextCodePoint (struct utf8Iterator* iter, + cpUcs4 errCh); + + Parse code point the iterator is pointing at and advance the iterator to + the next code point. If the iterator was pointing at a valid code point + the code point is returned, otherwise, errCh will be returned. + + .......................................................................... + + extern cpUcs4 utf8IteratorGetCurrCodePoint (struct utf8Iterator* iter, + cpUcs4 errCh); + + Parse code point the iterator is pointing at. If the iterator was + pointing at a valid code point the code point is returned, otherwise, + errCh will be returned. + + .......................................................................... + + extern int utf8ScanBackwardsForCodePoint (unsigned char* msg, int len, + int pos, cpUcs4* out); + + From the position "pos" in the array msg of length len, search for the + last position before or at pos where from which a valid Unicode code + point can be parsed. If such an offset is found it is returned otherwise + a negative value is returned. The code point parsed is put into *out if + it is not NULL. + + .......................................................................... + + extern int buIsUTF8Content (const_bstring bu); + + Scan a bstring and determine if it is made entirely of unicode code + valid points. If it is, 1 is returned, otherwise 0 is returned. + + .......................................................................... + + extern int buAppendBlkUcs4 (bstring b, const cpUcs4* bu, int len, + cpUcs4 errCh); + + Append the code points passed in the UCS4 format (raw numbers) in the + array bu of length len. Any unparsable characters are replaced by errCh. + If errCh is not a valid Unicode code point, then parsing errors will cause + BSTR_ERR to be returned. + + .......................................................................... + + extern int buGetBlkUTF16 (cpUcs2* ucs2, int len, cpUcs4 errCh, + const_bstring bu, int pos); + + Convert a string of UTF8 codepoints (bu), skipping the first pos, into a + sequence of UTF16 encoded code points. Returns the number of UCS2 16-bit + words written to the output. No more than len words are written to the + target array ucs2. If any code point in bu is unparsable, it will be + translated to errCh. + + .......................................................................... + + extern int buAppendBlkUTF16 (bstring bu, const cpUcs2* utf16, int len, + cpUcs2* bom, cpUcs4 errCh); + + Append an array of UCS2 code points (utf16) to UTF8 codepoints (bu). Any + invalid code point is replaced by errCh. If errCh is itself not a + valid code point, then this translation will halt upon the first error + and return BSTR_ERR. Otherwise BSTR_OK is returned. If a byte order mark + has been previously read, it may be passed in as bom, otherwise if *bom is + set to 0, it will be filled in with the BOM as read from the first + character if it is a BOM. + +=============================================================================== + +The bstest module +----------------- + +The bstest module is just a unit test for the bstrlib module. For correct +implementations of bstrlib, it should execute with 0 failures being reported. +This test should be utilized if modifications/customizations to bstrlib have +been performed. It tests each core bstrlib function with bstrings of every +mode (read-only, NULL, static and mutable) and ensures that the expected +semantics are observed (including results that should indicate an error). It +also tests for aliasing support. Passing bstest is a necessary but not a +sufficient condition for ensuring the correctness of the bstrlib module. + + +The test module +--------------- + +The test module is just a unit test for the bstrwrap module. For correct +implementations of bstrwrap, it should execute with 0 failures being +reported. This test should be utilized if modifications/customizations to +bstrwrap have been performed. It tests each core bstrwrap function with +CBStrings write protected or not and ensures that the expected semantics are +observed (including expected exceptions.) Note that exceptions cannot be +disabled to run this test. Passing test is a necessary but not a sufficient +condition for ensuring the correctness of the bstrwrap module. + +=============================================================================== + +Using Bstring and CBString as an alternative to the C library +------------------------------------------------------------- + +First let us give a table of C library functions and the alternative bstring +functions and CBString methods that should be used instead of them. + +C-library Bstring alternative CBString alternative +--------- ------------------- -------------------- +gets bgets ::gets +strcpy bassign = operator +strncpy bassignmidstr ::midstr +strcat bconcat += operator +strncat bconcat + btrunc += operator + ::trunc +strtok bsplit, bsplits ::split +sprintf b(assign)format ::format +snprintf b(assign)format + btrunc ::format + ::trunc +vsprintf bvformata bvformata + +vsnprintf bvformata + btrunc bvformata + btrunc +vfprintf bvformata + fputs use bvformata + fputs +strcmp biseq, bstrcmp comparison operators. +strncmp bstrncmp, memcmp bstrncmp, memcmp +strlen ->slen, blength ::length +strdup bstrcpy constructor +strset bpattern ::fill +strstr binstr ::find +strpbrk binchr ::findchr +stricmp bstricmp cast & use bstricmp +strlwr btolower cast & use btolower +strupr btoupper cast & use btoupper +strrev bReverse (aux module) cast & use bReverse +strchr bstrchr cast & use bstrchr +strspnp use strspn use strspn +ungetc bsunread bsunread + +The top 9 C functions listed here are troublesome in that they impose memory +management in the calling function. The Bstring and CBstring interfaces have +built-in memory management, so there is far less code with far less potential +for buffer overrun problems. strtok can only be reliably called as a "leaf" +calculation, since it (quite bizarrely) maintains hidden internal state. And +gets is well known to be broken no matter what. The Bstrlib alternatives do +not suffer from those sorts of problems. + +The substitute for strncat can be performed with higher performance by using +the blk2tbstr macro to create a presized second operand for bconcat. + +C-library Bstring alternative CBString alternative +--------- ------------------- -------------------- +strspn strspn acceptable strspn acceptable +strcspn strcspn acceptable strcspn acceptable +strnset strnset acceptable strnset acceptable +printf printf acceptable printf acceptable +puts puts acceptable puts acceptable +fprintf fprintf acceptable fprintf acceptable +fputs fputs acceptable fputs acceptable +memcmp memcmp acceptable memcmp acceptable + +Remember that Bstring (and CBstring) functions will automatically append the +'\0' character to the character data buffer. So by simply accessing the data +buffer directly, ordinary C string library functions can be called directly +on them. Note that bstrcmp is not the same as memcmp in exactly the same way +that strcmp is not the same as memcmp. + +C-library Bstring alternative CBString alternative +--------- ------------------- -------------------- +fread balloc + fread ::alloc + fread +fgets balloc + fgets ::alloc + fgets + +These are odd ones because of the exact sizing of the buffer required. The +Bstring and CBString alternatives requires that the buffers are forced to +hold at least the prescribed length, then just use fread or fgets directly. +However, typically the automatic memory management of Bstring and CBstring +will make the typical use of fgets and fread to read specifically sized +strings unnecessary. + +Implementation Choices +---------------------- + +Overhead: +......... + +The bstring library has more overhead versus straight char buffers for most +functions. This overhead is essentially just the memory management and +string header allocation. This overhead usually only shows up for small +string manipulations. The performance loss has to be considered in +light of the following: + +1) What would be the performance loss of trying to write this management + code in one's own application? +2) Since the bstring library source code is given, a sufficiently powerful + modern inlining globally optimizing compiler can remove function call + overhead. + +Since the data type is exposed, a developer can replace any unsatisfactory +function with their own inline implementation. And that is besides the main +point of what the better string library is mainly meant to provide. Any +overhead lost has to be compared against the value of the safe abstraction +for coupling memory management and string functionality. + +Performance of the C interface: +............................... + +The algorithms used have performance advantages versus the analogous C +library functions. For example: + +1. bfromcstr/blk2str/bstrcpy versus strcpy/strdup. By using memmove instead + of strcpy, the break condition of the copy loop is based on an independent + counter (that should be allocated in a register) rather than having to + check the results of the load. Modern out-of-order executing CPUs can + parallelize the final branch mis-predict penality with the loading of the + source string. Some CPUs will also tend to have better built-in hardware + support for counted memory moves than load-compare-store. (This is a + minor, but non-zero gain.) +2. biseq versus strcmp. If the strings are unequal in length, bsiseq will + return in O(1) time. If the strings are aliased, or have aliased data + buffers, biseq will return in O(1) time. strcmp will always be O(k), + where k is the length of the common prefix or the whole string if they are + identical. +3. ->slen versus strlen. ->slen is obviously always O(1), while strlen is + always O(n) where n is the length of the string. +4. bconcat versus strcat. Both rely on precomputing the length of the + destination string argument, which will favor the bstring library. On + iterated concatenations the performance difference can be enormous. +5. bsreadln versus fgets. The bsreadln function reads large blocks at a time + from the given stream, then parses out lines from the buffers directly. + Some C libraries will implement fgets as a loop over single fgetc calls. + Testing indicates that the bsreadln approach can be several times faster + for fast stream devices (such as a file that has been entirely cached.) +6. bsplits/bsplitscb versus strspn. Accelerators for the set of match + characters are generated only once. +7. binstr versus strstr. The binstr implementation unrolls the loops to + help reduce loop overhead. This will matter if the target string is + long and source string is not found very early in the target string. + With strstr, while it is possible to unroll the source contents, it is + not possible to do so with the destination contents in a way that is + effective because every destination character must be tested against + '\0' before proceeding to the next character. +8. bReverse versus strrev. The C function must find the end of the string + first before swaping character pairs. +9. bstrrchr versus no comparable C function. Its not hard to write some C + code to search for a character from the end going backwards. But there + is no way to do this without computing the length of the string with + strlen. + +Practical testing indicates that in general Bstrlib is never signifcantly +slower than the C library for common operations, while very often having a +performance advantage that ranges from significant to massive. Even for +functions like b(n)inchr versus str(c)spn() (where, in theory, there is no +advantage for the Bstrlib architecture) the performance of Bstrlib is vastly +superior to most tested C library implementations. + +Some of Bstrlib's extra functionality also lead to inevitable performance +advantages over typical C solutions. For example, using the blk2tbstr macro, +one can (in O(1) time) generate an internal substring by reference while not +disturbing the original string. If disturbing the original string is not an +option, typically, a comparable char * solution would have to make a copy of +the substring to provide similar functionality. Another example is reverse +character set scanning -- the str(c)spn functions only scan in a forward +direction which can complicate some parsing algorithms. + +Where high performance char * based algorithms are available, Bstrlib can +still leverage them by accessing the ->data field on bstrings. So +realistically Bstrlib can never be significantly slower than any standard +'\0' terminated char * based solutions. + +Performance of the C++ interface: +................................. + +The C++ interface has been designed with an emphasis on abstraction and safety +first. However, since it is substantially a wrapper for the C bstring +functions, for longer strings the performance comments described in the +"Performance of the C interface" section above still apply. Note that the +(CBString *) type can be directly cast to a (bstring) type, and passed as +parameters to the C functions (though a CBString must never be passed to +bdestroy.) + +Probably the most controversial choice is performing full bounds checking on +the [] operator. This decision was made because 1) the fast alternative of +not bounds checking is still available by first casting the CBString to a +(const char *) buffer or to a (struct tagbstring) then derefencing .data and +2) because the lack of bounds checking is seen as one of the main weaknesses +of C/C++ versus other languages. This check being done on every access leads +to individual character extraction being actually slower than other languages +in this one respect (other language's compilers will normally dedicate more +resources on hoisting or removing bounds checking as necessary) but otherwise +bring C++ up to the level of other languages in terms of functionality. + +It is common for other C++ libraries to leverage the abstractions provided by +C++ to use reference counting and "copy on write" policies. While these +techniques can speed up some scenarios, they impose a problem with respect to +thread safety. bstrings and CBStrings can be properly protected with +"per-object" mutexes, meaning that two bstrlib calls can be made and execute +simultaneously, so long as the bstrings and CBstrings are distinct. With a +reference count and alias before copy on write policy, global mutexes are +required that prevent multiple calls to the strings library to execute +simultaneously regardless of whether or not the strings represent the same +string. + +One interesting trade off in CBString is that the default constructor is not +trivial. I.e., it always prepares a ready to use memory buffer. The purpose +is to ensure that there is a uniform internal composition for any functioning +CBString that is compatible with bstrings. It also means that the other +methods in the class are not forced to perform "late initialization" checks. +In the end it means that construction of CBStrings are slower than other +comparable C++ string classes. Initial testing, however, indicates that +CBString outperforms std::string and MFC's CString, for example, in all other +operations. So to work around this weakness it is recommended that CBString +declarations be pushed outside of inner loops. + +Practical testing indicates that with the exception of the caveats given +above (constructors and safe index character manipulations) the C++ API for +Bstrlib generally outperforms popular standard C++ string classes. Amongst +the standard libraries and compilers, the quality of concatenation operations +varies wildly and very little care has gone into search functions. Bstrlib +dominates those performance benchmarks. + +Memory management: +.................. + +The bstring functions which write and modify bstrings will automatically +reallocate the backing memory for the char buffer whenever it is required to +grow. The algorithm for resizing chosen is to snap up to sizes that are a +power of two which are sufficient to hold the intended new size. Memory +reallocation is not performed when the required size of the buffer is +decreased. This behavior can be relied on, and is necessary to make the +behaviour of balloc deterministic. This trades off additional memory usage +for decreasing the frequency for required reallocations: + +1. For any bstring whose size never exceeds n, its buffer is not ever + reallocated more than log_2(n) times for its lifetime. +2. For any bstring whose size never exceeds n, its buffer is never more than + 2*(n+1) in length. (The extra characters beyond 2*n are to allow for the + implicit '\0' which is always added by the bstring modifying functions.) + +Decreasing the buffer size when the string decreases in size would violate 1) +above and in real world case lead to pathological heap thrashing. Similarly, +allocating more tightly than "least power of 2 greater than necessary" would +lead to a violation of 1) and have the same potential for heap thrashing. + +Property 2) needs emphasizing. Although the memory allocated is always a +power of 2, for a bstring that grows linearly in size, its buffer memory also +grows linearly, not exponentially. The reason is that the amount of extra +space increases with each reallocation, which decreases the frequency of +future reallocations. + +Obviously, given that bstring writing functions may reallocate the data +buffer backing the target bstring, one should not attempt to cache the data +buffer address and use it after such bstring functions have been called. +This includes making reference struct tagbstrings which alias to a writable +bstring. + +balloc or bfromcstralloc can be used to preallocate the minimum amount of +space used for a given bstring. This will reduce even further the number of +times the data portion is reallocated. If the length of the string is never +more than one less than the memory length then there will be no further +reallocations. + +Note that invoking the bwriteallow macro may increase the number of reallocs +by one more than necessary for every call to bwriteallow interleaved with any +bstring API which writes to this bstring. + +The library does not use any mechanism for automatic clean up for the C API. +Thus explicit clean up via calls to bdestroy() are required to avoid memory +leaks. + +Constant and static tagbstrings: +................................ + +A struct tagbstring can be write protected from any bstrlib function using +the bwriteprotect macro. A write protected struct tagbstring can then be +reset to being writable via the bwriteallow macro. There is, of course, no +protection from attempts to directly access the bstring members. Modifying a +bstring which is write protected by direct access has undefined behavior. + +static struct tagbstrings can be declared via the bsStatic macro. They are +considered permanently unwritable. Such struct tagbstrings's are declared +such that attempts to write to it are not well defined. Invoking either +bwriteallow or bwriteprotect on static struct tagbstrings has no effect. + +struct tagbstring's initialized via btfromcstr or blk2tbstr are protected by +default but can be made writeable via the bwriteallow macro. If bwriteallow +is called on such struct tagbstring's, it is the programmer's responsibility +to ensure that: + +1) the buffer supplied was allocated from the heap. +2) bdestroy is not called on this tagbstring (unless the header itself has + also been allocated from the heap.) +3) free is called on the buffer to reclaim its memory. + +bwriteallow and bwriteprotect can be invoked on ordinary bstrings (they have +to be dereferenced with the (*) operator to get the levels of indirection +correct) to give them write protection. + +Buffer declaration: +................... + +The memory buffer is actually declared "unsigned char *" instead of "char *". +The reason for this is to trigger compiler warnings whenever uncasted char +buffers are assigned to the data portion of a bstring. This will draw more +diligent programmers into taking a second look at the code where they +have carelessly left off the typically required cast. (Research from +AT&T/Lucent indicates that additional programmer eyeballs is one of the most +effective mechanisms at ferreting out bugs.) + +Function pointers: +.................. + +The bgets, bread and bStream functions use function pointers to obtain +strings from data streams. The function pointer declarations have been +specifically chosen to be compatible with the fgetc and fread functions. +While this may seem to be a convoluted way of implementing fgets and fread +style functionality, it has been specifically designed this way to ensure +that there is no dependency on a single narrowly defined set of device +interfaces, such as just stream I/O. In the embedded world, its quite +possible to have environments where such interfaces may not exist in the +standard C library form. Furthermore, the generalization that this opens up +allows for more sophisticated uses for these functions (performing an fgets +like function on a socket, for example.) By using function pointers, it also +allows such abstract stream interfaces to be created using the bstring library +itself while not creating a circular dependency. + +Use of int's for sizes: +....................... + +This is just a recognition that 16bit platforms with requirements for strings +that are larger than 64K and 32bit+ platforms with requirements for strings +that are larger than 4GB are pretty marginal. The main focus is for 32bit +platforms, and emerging 64bit platforms with reasonable < 4GB string +requirements. Using ints allows for negative values which has meaning +internally to bstrlib. + +Semantic consideration: +....................... + +Certain care needs to be taken when copying and aliasing bstrings. A bstring +is essentially a pointer type which points to a multipart abstract data +structure. Thus usage, and lifetime of bstrings have semantics that follow +these considerations. For example: + + bstring a, b; + struct tagbstring t; + + a = bfromcstr("Hello"); /* Create new bstring and copy "Hello" into it. */ + b = a; /* Alias b to the contents of a. */ + t = *a; /* Create a current instance pseudo-alias of a. */ + bconcat (a, b); /* Double a and b, t is now undefined. */ + bdestroy (a); /* Destroy the contents of both a and b. */ + +Variables of type bstring are really just references that point to real +bstring objects. The equal operator (=) creates aliases, and the asterisk +dereference operator (*) creates a kind of alias to the current instance (which +is generally not useful for any purpose.) Using bstrcpy() is the correct way +of creating duplicate instances. The ampersand operator (&) is useful for +creating aliases to struct tagbstrings (remembering that constructed struct +tagbstrings are not writable by default.) + +CBStrings use complete copy semantics for the equal operator (=), and thus do +not have these sorts of issues. + +Debugging: +.......... + +Bstrings have a simple, exposed definition and construction, and the library +itself is open source. So most debugging is going to be fairly straight- +forward. But the memory for bstrings come from the heap, which can often be +corrupted indirectly, and it might not be obvious what has happened even from +direct examination of the contents in a debugger or a core dump. There are +some tools such as Purify, Insure++ and Electric Fence which can help solve +such problems, however another common approach is to directly instrument the +calls to malloc, realloc, calloc, free, memcpy, memmove and/or other calls +by overriding them with macro definitions. + +Although the user could hack on the Bstrlib sources directly as necessary to +perform such an instrumentation, Bstrlib comes with a built-in mechanism for +doing this. By defining the macro BSTRLIB_MEMORY_DEBUG and providing an +include file named memdbg.h this will force the core Bstrlib modules to +attempt to include this file. In such a file, macros could be defined which +overrides Bstrlib's useage of the C standard library. + +Rather than calling malloc, realloc, free, memcpy or memmove directly, Bstrlib +emits the macros bstr__alloc, bstr__realloc, bstr__free, bstr__memcpy and +bstr__memmove in their place respectively. By default these macros are simply +assigned to be equivalent to their corresponding C standard library function +call. However, if they are given earlier macro definitions (via the back +door include file) they will not be given their default definition. In this +way Bstrlib's interface to the standard library can be changed but without +having to directly redefine or link standard library symbols (both of which +are not strictly ANSI C compliant.) + +An example definition might include: + + #define bstr__alloc(sz) X_malloc ((sz), __LINE__, __FILE__) + +which might help contextualize heap entries in a debugging environment. + +The NULL parameter and sanity checking of bstrings is part of the Bstrlib +API, and thus Bstrlib itself does not present any different modes which would +correspond to "Debug" or "Release" modes. Bstrlib always contains mechanisms +which one might think of as debugging features, but retains the performance +and small memory footprint one would normally associate with release mode +code. + +Integration Microsoft's Visual Studio debugger: +............................................... + +Microsoft's Visual Studio debugger has a capability of customizable mouse +float over data type descriptions. This is accomplished by editting the +AUTOEXP.DAT file to include the following: + + ; new for CBString + tagbstring =slen=<slen> mlen=<mlen> <data,st> + Bstrlib::CBStringList =count=<size()> + +In Visual C++ 6.0 this file is located in the directory: + + C:\Program Files\Microsoft Visual Studio\Common\MSDev98\Bin + +and in Visual Studio .NET 2003 its located here: + + C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Packages\Debugger + +This will improve the ability of debugging with Bstrlib under Visual Studio. + +Security +-------- + +Bstrlib does not come with explicit security features outside of its fairly +comprehensive error detection, coupled with its strict semantic support. +That is to say that certain common security problems, such as buffer overrun, +constant overwrite, arbitrary truncation etc, are far less likely to happen +inadvertently. Where it does help, Bstrlib maximizes its advantage by +providing developers a simple adoption path that lets them leave less secure +string mechanisms behind. The library will not leave developers wanting, so +they will be less likely to add new code using a less secure string library +to add functionality that might be missing from Bstrlib. + +That said there are a number of security ideas not addressed by Bstrlib: + +1. Race condition exploitation (i.e., verifying a string's contents, then +raising the privilege level and execute it as a shell command as two +non-atomic steps) is well beyond the scope of what Bstrlib can provide. It +should be noted that MFC's built-in string mutex actually does not solve this +problem either -- it just removes immediate data corruption as a possible +outcome of such exploit attempts (it can be argued that this is worse, since +it will leave no trace of the exploitation). In general race conditions have +to be dealt with by careful design and implementation; it cannot be assisted +by a string library. + +2. Any kind of access control or security attributes to prevent usage in +dangerous interfaces such as system(). Perl includes a "trust" attribute +which can be endowed upon strings that are intended to be passed to such +dangerous interfaces. However, Perl's solution reflects its own limitations +-- notably that it is not a strongly typed language. In the example code for +Bstrlib, there is a module called taint.cpp. It demonstrates how to write a +simple wrapper class for managing "untainted" or trusted strings using the +type system to prevent questionable mixing of ordinary untrusted strings with +untainted ones then passing them to dangerous interfaces. In this way the +security correctness of the code reduces to auditing the direct usages of +dangerous interfaces or promotions of tainted strings to untainted ones. + +3. Encryption of string contents is way beyond the scope of Bstrlib. +Maintaining encrypted string contents in the futile hopes of thwarting things +like using system-level debuggers to examine sensitive string data is likely +to be a wasted effort (imagine a debugger that runs at a higher level than a +virtual processor where the application runs). For more standard encryption +usages, since the bstring contents are simply binary blocks of data, this +should pose no problem for usage with other standard encryption libraries. + +Compatibility +------------- + +The Better String Library is known to compile and function correctly with the +following compilers: + + - Microsoft Visual C++ + - Watcom C/C++ + - Intel's C/C++ compiler (Windows) + - The GNU C/C++ compiler (cygwin and Linux on PPC64) + - Borland C + - Turbo C + +Setting of configuration options should be unnecessary for these compilers +(unless exceptions are being disabled or STLport has been added to WATCOM +C/C++). Bstrlib has been developed with an emphasis on portability. As such +porting it to other compilers should be straight forward. This package +includes a porting guide (called porting.txt) which explains what issues may +exist for porting Bstrlib to different compilers and environments. + +ANSI issues +----------- + +1. The function pointer types bNgetc and bNread have prototypes which are very +similar to, but not exactly the same as fgetc and fread respectively. +Basically the FILE * parameter is replaced by void *. The purpose of this +was to allow one to create other functions with fgetc and fread like +semantics without being tied to ANSI C's file streaming mechanism. I.e., one +could very easily adapt it to sockets, or simply reading a block of memory, +or procedurally generated strings (for fractal generation, for example.) + +The problem is that invoking the functions (bNgetc)fgetc and (bNread)fread is +not technically legal in ANSI C. The reason being that the compiler is only +able to coerce the function pointers themselves into the target type, however +are unable to perform any cast (implicit or otherwise) on the parameters +passed once invoked. I.e., if internally void * and FILE * need some kind of +mechanical coercion, the compiler will not properly perform this conversion +and thus lead to undefined behavior. + +Apparently a platform from Data General called "Eclipse" and another from +Tandem called "NonStop" have a different representation for pointers to bytes +and pointers to words, for example, where coercion via casting is necessary. +(Actual confirmation of the existence of such machines is hard to come by, so +it is prudent to be skeptical about this information.) However, this is not +an issue for any known contemporary platforms. One may conclude that such +platforms are effectively apocryphal even if they do exist. + +To correctly work around this problem to the satisfaction of the ANSI +limitations, one needs to create wrapper functions for fgets and/or +fread with the prototypes of bNgetc and/or bNread respectively which performs +no other action other than to explicitely cast the void * parameter to a +FILE *, and simply pass the remaining parameters straight to the function +pointer call. + +The wrappers themselves are trivial: + + size_t freadWrap (void * buff, size_t esz, size_t eqty, void * parm) { + return fread (buff, esz, eqty, (FILE *) parm); + } + + int fgetcWrap (void * parm) { + return fgetc ((FILE *) parm); + } + +These have not been supplied in bstrlib or bstraux to prevent unnecessary +linking with file I/O functions. + +2. vsnprintf is not available on all compilers. Because of this, the bformat +and bformata functions (and format and formata methods) are not guaranteed to +work properly. For those compilers that don't have vsnprintf, the +BSTRLIB_NOVSNP macro should be set before compiling bstrlib, and the format +functions/method will be disabled. + +The more recent ANSI C standards have specified the required inclusion of a +vsnprintf function. + +3. The bstrlib function names are not unique in the first 6 characters. This +is only an issue for older C compiler environments which do not store more +than 6 characters for function names. + +4. The bsafe module defines macros and function names which are part of the +C library. This simply overrides the definition as expected on all platforms +tested, however it is not sanctioned by the ANSI standard. This module is +clearly optional and should be omitted on platforms which disallow its +undefined semantics. + +In practice the real issue is that some compilers in some modes of operation +can/will inline these standard library functions on a module by module basis +as they appear in each. The linker will thus have no opportunity to override +the implementation of these functions for those cases. This can lead to +inconsistent behaviour of the bsafe module on different platforms and +compilers. + +=============================================================================== + +Comparison with Microsoft's CString class +----------------------------------------- + +Although developed independently, CBStrings have very similar functionality to +Microsoft's CString class. However, the bstring library has significant +advantages over CString: + +1. Bstrlib is a C-library as well as a C++ library (using the C++ wrapper). + + - Thus it is compatible with more programming environments and + available to a wider population of programmers. + +2. The internal structure of a bstring is considered exposed. + + - A single contiguous block of data can be cut into read-only pieces by + simply creating headers, without allocating additional memory to create + reference copies of each of these sub-strings. + - In this way, using bstrings in a totally abstracted way becomes a choice + rather than an imposition. Further this choice can be made differently + at different layers of applications that use it. + +3. Static declaration support precludes the need for constructor + invocation. + + - Allows for static declarations of constant strings that has no + additional constructor overhead. + +4. Bstrlib is not attached to another library. + + - Bstrlib is designed to be easily plugged into any other library + collection, without dependencies on other libraries or paradigms (such + as "MFC".) + +The bstring library also comes with a few additional functions that are not +available in the CString class: + + - bsetstr + - bsplit + - bread + - breplace (this is different from CString::Replace()) + - Writable indexed characters (for example a[i]='x') + +Interestingly, although Microsoft did implement mid$(), left$() and right$() +functional analogues (these are functions from GWBASIC) they seem to have +forgotten that mid$() could be also used to write into the middle of a string. +This functionality exists in Bstrlib with the bsetstr() and breplace() +functions. + +Among the disadvantages of Bstrlib is that there is no special support for +localization or wide characters. Such things are considered beyond the scope +of what bstrings are trying to deliver. CString essentially supports the +older UCS-2 version of Unicode via widechar_t as an application-wide compile +time switch. + +CString's also use built-in mechanisms for ensuring thread safety under all +situations. While this makes writing thread safe code that much easier, this +built-in safety feature has a price -- the inner loops of each CString method +runs in its own critical section (grabbing and releasing a light weight mutex +on every operation.) The usual way to decrease the impact of a critical +section performance penalty is to amortize more operations per critical +section. But since the implementation of CStrings is fixed as a one critical +section per-operation cost, there is no way to leverage this common +performance enhancing idea. + +The search facilities in Bstrlib are comparable to those in MFC's CString +class, though it is missing locale specific collation. But because Bstrlib +is interoperable with C's char buffers, it will allow programmers to write +their own string searching mechanism (such as Boyer-Moore), or be able to +choose from a variety of available existing string searching libraries (such +as those for regular expressions) without difficulty. + +Microsoft used a very non-ANSI conforming trick in its implementation to +allow printf() to use the "%s" specifier to output a CString correctly. This +can be convenient, but it is inherently not portable. CBString requires an +explicit cast, while bstring requires the data member to be dereferenced. +Microsoft's own documentation recommends casting, instead of relying on this +feature. + +Comparison with C++'s std::string +--------------------------------- + +This is the C++ language's standard STL based string class. + +1. There is no C implementation. +2. The [] operator is not bounds checked. +3. Missing a lot of useful functions like printf-like formatting. +4. Some sub-standard std::string implementations (SGI) are necessarily unsafe + to use with multithreading. +5. Limited by STL's std::iostream which in turn is limited by ifstream which + can only take input from files. (Compare to CBStream's API which can take + abstracted input.) +6. Extremely uneven performance across implementations. + +Comparison with ISO C TR 24731 proposal +--------------------------------------- + +Following the ISO C99 standard, Microsoft has proposed a group of C library +extensions which are supposedly "safer and more secure". This proposal is +expected to be adopted by the ISO C standard which follows C99. + +The proposal reveals itself to be very similar to Microsoft's "StrSafe" +library. The functions are basically the same as other standard C library +string functions except that destination parameters are paired with an +additional length parameter of type rsize_t. rsize_t is the same as size_t, +however, the range is checked to make sure its between 1 and RSIZE_MAX. Like +Bstrlib, the functions perform a "parameter check". Unlike Bstrlib, when a +parameter check fails, rather than simply outputing accumulatable error +statuses, they call a user settable global error function handler, and upon +return of control performs no (additional) detrimental action. The proposal +covers basic string functions as well as a few non-reenterable functions +(asctime, ctime, and strtok). + +1. Still based solely on char * buffers (and therefore strlen() and strcat() + is still O(n), and there are no faster streq() comparison functions.) +2. No growable string semantics. +3. Requires manual buffer length synchronization in the source code. +4. No attempt to enhance functionality of the C library. +5. Introduces a new error scenario (strings exceeding RSIZE_MAX length). + +The hope is that by exposing the buffer length requirements there will be +fewer buffer overrun errors. However, the error modes are really just +transformed, rather than removed. The real problem of buffer overflows is +that they all happen as a result of erroneous programming. So forcing +programmers to manually deal with buffer limits, will make them more aware of +the problem but doesn't remove the possibility of erroneous programming. So +a programmer that erroneously mixes up the rsize_t parameters is no better off +from a programmer that introduces potential buffer overflows through other +more typical lapses. So at best this may reduce the rate of erroneous +programming, rather than making any attempt at removing failure modes. + +The error handler can discriminate between types of failures, but does not +take into account any callsite context. So the problem is that the error is +going to be manifest in a piece of code, but there is no pointer to that +code. It would seem that passing in the call site __FILE__, __LINE__ as +parameters would be very useful, but the API clearly doesn't support such a +thing (it would increase code bloat even more than the extra length +parameter does, and would require macro tricks to implement). + +The Bstrlib C API takes the position that error handling needs to be done at +the callsite, and just tries to make it as painless as possible. Furthermore, +error modes are removed by supporting auto-growing strings and aliasing. For +capturing errors in more central code fragments, Bstrlib's C++ API uses +exception handling extensively, which is superior to the leaf-only error +handler approach. + +Comparison with Managed String Library CERT proposal +---------------------------------------------------- + +The main webpage for the managed string library: +http://www.cert.org/secure-coding/managedstring.html + +Robert Seacord at CERT has proposed a C string library that he calls the +"Managed String Library" for C. Like Bstrlib, it introduces a new type +which is called a managed string. The structure of a managed string +(string_m) is like a struct tagbstring but missing the length field. This +internal structure is considered opaque. The length is, like the C standard +library, always computed on the fly by searching for a terminating NUL on +every operation that requires it. So it suffers from every performance +problem that the C standard library suffers from. Interoperating with C +string APIs (like printf, fopen, or anything else that takes a string +parameter) requires copying to additionally allocating buffers that have to +be manually freed -- this makes this library probably slower and more +cumbersome than any other string library in existence. + +The library gives a fully populated error status as the return value of every +string function. The hope is to be able to diagnose all problems +specifically from the return code alone. Comparing this to Bstrlib, which +aways returns one consistent error message, might make it seem that Bstrlib +would be harder to debug; but this is not true. With Bstrlib, if an error +occurs there is always enough information from just knowing there was an error +and examining the parameters to deduce exactly what kind of error has +happened. The managed string library thus gives up nested function calls +while achieving little benefit, while Bstrlib does not. + +One interesting feature that "managed strings" has is the idea of data +sanitization via character set whitelisting. That is to say, a globally +definable filter that makes any attempt to put invalid characters into strings +lead to an error and not modify the string. The author gives the following +example: + + // create valid char set + if (retValue = strcreate_m(&str1, "abc") ) { + fprintf( + stderr, + "Error %d from strcreate_m.\n", + retValue + ); + } + if (retValue = setcharset(str1)) { + fprintf( + stderr, + "Error %d from setcharset().\n", + retValue + ); + } + if (retValue = strcreate_m(&str1, "aabbccabc")) { + fprintf( + stderr, + "Error %d from strcreate_m.\n", + retValue + ); + } + // create string with invalid char set + if (retValue = strcreate_m(&str1, "abbccdabc")) { + fprintf( + stderr, + "Error %d from strcreate_m.\n", + retValue + ); + } + +Which we can compare with a more Bstrlib way of doing things: + + bstring bCreateWithFilter (const char * cstr, const_bstring filter) { + bstring b = bfromcstr (cstr); + if (BSTR_ERR != bninchr (b, filter) && NULL != b) { + fprintf (stderr, "Filter violation.\n"); + bdestroy (b); + b = NULL; + } + return b; + } + + struct tagbstring charFilter = bsStatic ("abc"); + bstring str1 = bCreateWithFilter ("aabbccabc", &charFilter); + bstring str2 = bCreateWithFilter ("aabbccdabc", &charFilter); + +The first thing we should notice is that with the Bstrlib approach you can +have different filters for different strings if necessary. Furthermore, +selecting a charset filter in the Managed String Library is uni-contextual. +That is to say, there can only be one such filter active for the entire +program, which means its usage is not well defined for intermediate library +usage (a library that uses it will interfere with user code that uses it, and +vice versa.) It is also likely to be poorly defined in multi-threading +environments. + +There is also a question as to whether the data sanitization filter is checked +on every operation, or just on creation operations. Since the charset can be +set arbitrarily at run time, it might be set *after* some managed strings have +been created. This would seem to imply that all functions should run this +additional check every time if there is an attempt to enforce this. This +would make things tremendously slow. On the other hand, if it is assumed that +only creates and other operations that take char *'s as input need be checked +because the charset was only supposed to be called once at and before any +other managed string was created, then one can see that its easy to cover +Bstrlib with equivalent functionality via a few wrapper calls such as the +example given above. + +And finally we have to question the value of sanitation in the first place. +For example, for httpd servers, there is generally a requirement that the +URLs parsed have some form that avoids undesirable translation to local file +system filenames or resources. The problem is that the way URLs can be +encoded, it must be completely parsed and translated to know if it is using +certain invalid character combinations. That is to say, merely filtering +each character one at a time is not necessarily the right way to ensure that +a string has safe contents. + +In the article that describes this proposal, it is claimed that it fairly +closely approximates the existing C API semantics. On this point we should +compare this "closeness" with Bstrlib: + + Bstrlib Managed String Library + ------- ---------------------- + +Pointer arithmetic Segment arithmetic N/A + +Use in C Std lib ->data, or bdata{e} getstr_m(x,*) ... free(x) + +String literals bsStatic, bsStaticBlk strcreate_m() + +Transparency Complete None + +Its pretty clear that the semantic mapping from C strings to Bstrlib is fairly +straightforward, and that in general semantic capabilities are the same or +superior in Bstrlib. On the other hand the Managed String Library is either +missing semantics or changes things fairly significantly. + +Comparison with Annexia's c2lib library +--------------------------------------- + +This library is available at: +http://www.annexia.org/freeware/c2lib + +1. Still based solely on char * buffers (and therefore strlen() and strcat() + is still O(n), and there are no faster streq() comparison functions.) + Their suggestion that alternatives which wrap the string data type (such as + bstring does) imposes a difficulty in interoperating with the C langauge's + ordinary C string library is not founded. +2. Introduction of memory (and vector?) abstractions imposes a learning + curve, and some kind of memory usage policy that is outside of the strings + themselves (and therefore must be maintained by the developer.) +3. The API is massive, and filled with all sorts of trivial (pjoin) and + controvertial (pmatch -- regular expression are not sufficiently + standardized, and there is a very large difference in performance between + compiled and non-compiled, REs) functions. Bstrlib takes a decidely + minimal approach -- none of the functionality in c2lib is difficult or + challenging to implement on top of Bstrlib (except the regex stuff, which + is going to be difficult, and controvertial no matter what.) +4. Understanding why c2lib is the way it is pretty much requires a working + knowledge of Perl. bstrlib requires only knowledge of the C string library + while providing just a very select few worthwhile extras. +5. It is attached to a lot of cruft like a matrix math library (that doesn't + include any functions for getting the determinant, eigenvectors, + eigenvalues, the matrix inverse, test for singularity, test for + orthogonality, a grahm schmit orthogonlization, LU decomposition ... I + mean why bother?) + +Convincing a development house to use c2lib is likely quite difficult. It +introduces too much, while not being part of any kind of standards body. The +code must therefore be trusted, or maintained by those that use it. While +bstring offers nothing more on this front, since its so much smaller, covers +far less in terms of scope, and will typically improve string performance, +the barrier to usage should be much smaller. + +Comparison with stralloc/qmail +------------------------------ + +More information about this library can be found here: +http://www.canonical.org/~kragen/stralloc.html or here: +http://cr.yp.to/lib/stralloc.html + +1. Library is very very minimal. A little too minimal. +2. Untargetted source parameters are not declared const. +3. Slightly different expected emphasis (like _cats function which takes an + ordinary C string char buffer as a parameter.) Its clear that the + remainder of the C string library is still required to perform more + useful string operations. + +The struct declaration for their string header is essentially the same as that +for bstring. But its clear that this was a quickly written hack whose goals +are clearly a subset of what Bstrlib supplies. For anyone who is served by +stralloc, Bstrlib is complete substitute that just adds more functionality. + +stralloc actually uses the interesting policy that a NULL data pointer +indicates an empty string. In this way, non-static empty strings can be +declared without construction. This advantage is minimal, since static empty +bstrings can be declared inline without construction, and if the string needs +to be written to it should be constructed from an empty string (or its first +initializer) in any event. + +wxString class +-------------- + +This is the string class used in the wxWindows project. A description of +wxString can be found here: +http://www.wxwindows.org/manuals/2.4.2/wx368.htm#wxstring + +This C++ library is similar to CBString. However, it is littered with +trivial functions (IsAscii, UpperCase, RemoveLast etc.) + +1. There is no C implementation. +2. The memory management strategy is to allocate a bounded fixed amount of + additional space on each resize, meaning that it does not have the + log_2(n) property that Bstrlib has (it will thrash very easily, cause + massive fragmentation in common heap implementations, and can easily be a + common source of performance problems). +3. The library uses a "copy on write" strategy, meaning that it has to deal + with multithreading problems. + +Vstr +---- + +This is a highly orthogonal C string library with an emphasis on +networking/realtime programming. It can be found here: +http://www.and.org/vstr/ + +1. The convoluted internal structure does not contain a '\0' char * compatible + buffer, so interoperability with the C library a non-starter. +2. The API and implementation is very large (owing to its orthogonality) and + can lead to difficulty in understanding its exact functionality. +3. An obvious dependency on gnu tools (confusing make configure step) +4. Uses a reference counting system, meaning that it is not likely to be + thread safe. + +The implementation has an extreme emphasis on performance for nontrivial +actions (adds, inserts and deletes are all constant or roughly O(#operations) +time) following the "zero copy" principle. This trades off performance of +trivial functions (character access, char buffer access/coersion, alias +detection) which becomes significantly slower, as well as incremental +accumulative costs for its searching/parsing functions. Whether or not Vstr +wins any particular performance benchmark will depend a lot on the benchmark, +but it should handily win on some, while losing dreadfully on others. + +The learning curve for Vstr is very steep, and it doesn't come with any +obvious way to build for Windows or other platforms without gnu tools. At +least one mechanism (the iterator) introduces a new undefined scenario +(writing to a Vstr while iterating through it.) Vstr has a very large +footprint, and is very ambitious in its total functionality. Vstr has no C++ +API. + +Vstr usage requires context initialization via vstr_init() which must be run +in a thread-local context. Given the totally reference based architecture +this means that sharing Vstrings across threads is not well defined, or at +least not safe from race conditions. This API is clearly geared to the older +standard of fork() style multitasking in UNIX, and is not safely transportable +to modern shared memory multithreading available in Linux and Windows. There +is no portable external solution making the library thread safe (since it +requires a mutex around each Vstr context -- not each string.) + +In the documentation for this library, a big deal is made of its self hosted +s(n)printf-like function. This is an issue for older compilers that don't +include vsnprintf(), but also an issue because Vstr has a slow conversion to +'\0' terminated char * mechanism. That is to say, using "%s" to format data +that originates from Vstr would be slow without some sort of native function +to do so. Bstrlib sidesteps the issue by relying on what snprintf-like +functionality does exist and having a high performance conversion to a char * +compatible string so that "%s" can be used directly. + +Str Library +----------- + +This is a fairly extensive string library, that includes full unicode support +and targetted at the goal of out performing MFC and STL. The architecture, +similarly to MFC's CStrings, is a copy on write reference counting mechanism. + +http://www.utilitycode.com/str/default.aspx + +1. Commercial. +2. C++ only. + +This library, like Vstr, uses a ref counting system. There is only so deeply +I can analyze it, since I don't have a license for it. However, performance +improvements over MFC's and STL, doesn't seem like a sufficient reason to +move your source base to it. For example, in the future, Microsoft may +improve the performance CString. + +It should be pointed out that performance testing of Bstrlib has indicated +that its relative performance advantage versus MFC's CString and STL's +std::string is at least as high as that for the Str library. + +libmib astrings +--------------- + +A handful of functional extensions to the C library that add dynamic string +functionality. +http://www.mibsoftware.com/libmib/astring/ + +This package basically references strings through char ** pointers and assumes +they are pointing to the top of an allocated heap entry (or NULL, in which +case memory will be newly allocated from the heap.) So its still up to user +to mix and match the older C string functions with these functions whenever +pointer arithmetic is used (i.e., there is no leveraging of the type system +to assert semantic differences between references and base strings as Bstrlib +does since no new types are introduced.) Unlike Bstrlib, exact string length +meta data is not stored, thus requiring a strlen() call on *every* string +writing operation. The library is very small, covering only a handful of C's +functions. + +While this is better than nothing, it is clearly slower than even the +standard C library, less safe and less functional than Bstrlib. + +To explain the advantage of using libmib, their website shows an example of +how dangerous C code: + + char buf[256]; + char *pszExtraPath = ";/usr/local/bin"; + + strcpy(buf,getenv("PATH")); /* oops! could overrun! */ + strcat(buf,pszExtraPath); /* Could overrun as well! */ + + printf("Checking...%s\n",buf); /* Some printfs overrun too! */ + +is avoided using libmib: + + char *pasz = 0; /* Must initialize to 0 */ + char *paszOut = 0; + char *pszExtraPath = ";/usr/local/bin"; + + if (!astrcpy(&pasz,getenv("PATH"))) /* malloc error */ exit(-1); + if (!astrcat(&pasz,pszExtraPath)) /* malloc error */ exit(-1); + + /* Finally, a "limitless" printf! we can use */ + asprintf(&paszOut,"Checking...%s\n",pasz);fputs(paszOut,stdout); + + astrfree(&pasz); /* Can use free(pasz) also. */ + astrfree(&paszOut); + +However, compare this to Bstrlib: + + bstring b, out; + + bcatcstr (b = bfromcstr (getenv ("PATH")), ";/usr/local/bin"); + out = bformat ("Checking...%s\n", bdatae (b, "<Out of memory>")); + /* if (out && b) */ fputs (bdatae (out, "<Out of memory>"), stdout); + bdestroy (b); + bdestroy (out); + +Besides being shorter, we can see that error handling can be deferred right +to the very end. Also, unlike the above two versions, if getenv() returns +with NULL, the Bstrlib version will not exhibit undefined behavior. +Initialization starts with the relevant content rather than an extra +autoinitialization step. + +libclc +------ + +An attempt to add to the standard C library with a number of common useful +functions, including additional string functions. +http://libclc.sourceforge.net/ + +1. Uses standard char * buffer, and adopts C 99's usage of "restrict" to pass + the responsibility to guard against aliasing to the programmer. +2. Adds no safety or memory management whatsoever. +3. Most of the supplied string functions are completely trivial. + +The goals of libclc and Bstrlib are clearly quite different. + +fireString +---------- + +http://firestuff.org/ + +1. Uses standard char * buffer, and adopts C 99's usage of "restrict" to pass + the responsibility to guard against aliasing to the programmer. +2. Mixes char * and length wrapped buffers (estr) functions, doubling the API + size, with safety limited to only half of the functions. + +Firestring was originally just a wrapper of char * functionality with extra +length parameters. However, it has been augmented with the inclusion of the +estr type which has similar functionality to stralloc. But firestring does +not nearly cover the functional scope of Bstrlib. + +Safe C String Library +--------------------- + +A library written for the purpose of increasing safety and power to C's string +handling capabilities. +http://www.zork.org/safestr/safestr.html + +1. While the safestr_* functions are safe in of themselves, interoperating + with char * string has dangerous unsafe modes of operation. +2. The architecture of safestr's causes the base pointer to change. Thus, + its not practical/safe to store a safestr in multiple locations if any + single instance can be manipulated. +3. Dependent on an additional error handling library. +4. Uses reference counting, meaning that it is either not thread safe or + slow and not portable. + +I think the idea of reallocating (and hence potentially changing) the base +pointer is a serious design flaw that is fatal to this architecture. True +safety is obtained by having automatic handling of all common scenarios +without creating implicit constraints on the user. + +Because of its automatic temporary clean up system, it cannot use "const" +semantics on input arguments. Interesting anomolies such as: + + safestr_t s, t; + s = safestr_replace (t = SAFESTR_TEMP ("This is a test"), + SAFESTR_TEMP (" "), SAFESTR_TEMP (".")); + /* t is now undefined. */ + +are possible. If one defines a function which takes a safestr_t as a +parameter, then the function would not know whether or not the safestr_t is +defined after it passes it to a safestr library function. The author +recommended method for working around this problem is to examine the +attributes of the safestr_t within the function which is to modify any of +its parameters and play games with its reference count. I think, therefore, +that the whole SAFESTR_TEMP idea is also fatally broken. + +The library implements immutability, optional non-resizability, and a "trust" +flag. This trust flag is interesting, and suggests that applying any +arbitrary sequence of safestr_* function calls on any set of trusted strings +will result in a trusted string. It seems to me, however, that if one wanted +to implement a trusted string semantic, one might do so by actually creating +a different *type* and only implement the subset of string functions that are +deemed safe (i.e., user input would be excluded, for example.) This, in +essence, would allow the compiler to enforce trust propogation at compile +time rather than run time. Non-resizability is also interesting, however, +it seems marginal (i.e., to want a string that cannot be resized, yet can be +modified and yet where a fixed sized buffer is undesirable.) + +Libsrt +------ + +This is a length based string library based on a slightly different strategy. +The string contents are appended to the end of the header directly so strings +only require a single allocation. However, whenever a reallocation occurs, +the header is replicated and the base pointer for the string is changed. +That means references to the string are only valid so long as they are not +resized after any such reference is cached. The internal structure maintains +a lot some state used to accelerate unicode manipulation. This state is +dynamically updated according to usage (so, like Bstrlib, it supports both +a binary mode and a Unicode mode basically all the time). But this makes +sustainable usage of the library essentially opaque. This also creates a +bottleneck for whatever extensions to the library one desires (write all +extensions on top of the base library, put in a request to the author, or +dedicate an expert to learn the internals of the library). + +SDS +--- + +Sds uses a strategy very similar to Libsrt. However, it uses some dynamic +headers to decrease the overhead for very small strings. This requires an +extra switch statement for access to each string attribute. The source code +appears to use gcc/clang extensions, and thus it is not portable. + +=============================================================================== + +Examples +-------- + + Dumping a line numbered file: + + FILE * fp; + int i, ret; + struct bstrList * lines; + struct tagbstring prefix = bsStatic ("-> "); + + if (NULL != (fp = fopen ("bstrlib.txt", "rb"))) { + bstring b = bread ((bNread) fread, fp); + fclose (fp); + if (NULL != (lines = bsplit (b, '\n'))) { + for (i=0; i < lines->qty; i++) { + binsert (lines->entry[i], 0, &prefix, '?'); + printf ("%04d: %s\n", i, bdatae (lines->entry[i], "NULL")); + } + bstrListDestroy (lines); + } + bdestroy (b); + } + +For numerous other examples, see bstraux.c, bstraux.h and the example archive. + +=============================================================================== + +License +------- + +The Better String Library is available under either the BSD license (see the +accompanying license.txt) or the Gnu Public License version 2 (see the +accompanying gpl.txt) at the option of the user. + +=============================================================================== + +Acknowledgements +---------------- + +The following individuals have made significant contributions to the design +and testing of the Better String Library: + +Bjorn Augestad +Clint Olsen +Darryl Bleau +Fabian Cenedese +Graham Wideman +Ignacio Burgueno +International Business Machines Corporation +Ira Mica +John Kortink +Manuel Woelker +Marcel van Kervinck +Michael Hsieh +Mike Steinert +Richard A. Smith +Simon Ekstrom +Wayne Scott +Zed A. Shaw + +=============================================================================== |
