DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

/usr/man2/cat.3/pcrebuild.3.Z





NAME

       PCRE - Perl-compatible regular expressions


PCRE BUILD-TIME OPTIONS


       This  document  describes  the  optional  features  of PCRE that can be
       selected when the library is compiled. They are all selected, or  dese-
       lected, by providing options to the configure script that is run before
       the make command. The complete list of  options  for  configure  (which
       includes  the  standard  ones such as the selection of the installation
       directory) can be obtained by running

         ./configure --help

       The following sections describe certain options whose names begin  with
       --enable  or  --disable. These settings specify changes to the defaults
       for the configure command. Because of the  way  that  configure  works,
       --enable  and  --disable  always  come  in  pairs, so the complementary
       option always exists as well, but as it specifies the  default,  it  is
       not described.


C++ SUPPORT


       By default, the configure script will search for a C++ compiler and C++
       header files. If it finds them, it automatically builds the C++ wrapper
       library for PCRE. You can disable this by adding

         --disable-cpp

       to the configure command.


UTF-8 SUPPORT


       To build PCRE with support for UTF-8 character strings, add

         --enable-utf8

       to  the  configure  command.  Of  itself, this does not make PCRE treat
       strings as UTF-8. As well as compiling PCRE with this option, you  also
       have  have to set the PCRE_UTF8 option when you call the pcre_compile()
       function.


UNICODE CHARACTER PROPERTY SUPPORT


       UTF-8 support allows PCRE to process character values greater than  255
       in  the  strings that it handles. On its own, however, it does not pro-
       vide any facilities for accessing the properties of such characters. If
       you  want  to  be able to use the pattern escapes \P, \p, and \X, which
       refer to Unicode character properties, you must add

         --enable-unicode-properties

       to the configure command. This implies UTF-8 support, even if you  have
       not explicitly requested it.

       Including  Unicode  property  support  adds around 90K of tables to the
       PCRE library, approximately doubling its size. Only the  general  cate-
       gory  properties  such as Lu and Nd are supported. Details are given in
       the pcrepattern documentation.


CODE VALUE OF NEWLINE


       By default, PCRE treats character 10 (linefeed) as the newline  charac-
       ter. This is the normal newline character on Unix-like systems. You can
       compile PCRE to use character 13 (carriage return) instead by adding

         --enable-newline-is-cr

       to the configure command. For completeness there is  also  a  --enable-
       newline-is-lf  option,  which explicitly specifies linefeed as the new-
       line character.


BUILDING SHARED AND STATIC LIBRARIES


       The PCRE building process uses libtool to build both shared and  static
       Unix  libraries by default. You can suppress one of these by adding one
       of

         --disable-shared
         --disable-static

       to the configure command, as required.


POSIX MALLOC USAGE


       When PCRE is called through the POSIX interface (see the pcreposix doc-
       umentation),  additional  working  storage  is required for holding the
       pointers to capturing substrings, because PCRE requires three  integers
       per  substring,  whereas  the POSIX interface provides only two. If the
       number of expected substrings is small, the wrapper function uses space
       on the stack, because this is faster than using malloc() for each call.
       The default threshold above which the stack is no longer used is 10; it
       can be changed by adding a setting such as

         --with-posix-malloc-threshold=20

       to the configure command.


LIMITING PCRE RESOURCE USAGE


       Internally,  PCRE has a function called match(), which it calls repeat-
       edly  (possibly  recursively)  when  matching  a   pattern   with   the
       pcre_exec()  function.  By controlling the maximum number of times this
       function may be called during a single matching operation, a limit  can
       be  placed  on  the resources used by a single call to pcre_exec(). The
       limit can be changed at run time, as described in the pcreapi  documen-
       tation.  The default is 10 million, but this can be changed by adding a
       setting such as

         --with-match-limit=500000

       to  the  configure  command.  This  setting  has  no  effect   on   the
       pcre_dfa_exec() matching function.


HANDLING VERY LARGE PATTERNS


       Within  a  compiled  pattern,  offset values are used to point from one
       part to another (for example, from an opening parenthesis to an  alter-
       nation  metacharacter).  By default, two-byte values are used for these
       offsets, leading to a maximum size for a  compiled  pattern  of  around
       64K.  This  is sufficient to handle all but the most gigantic patterns.
       Nevertheless, some people do want to process enormous patterns,  so  it
       is  possible  to compile PCRE to use three-byte or four-byte offsets by
       adding a setting such as

         --with-link-size=3

       to the configure command. The value given must be 2,  3,  or  4.  Using
       longer  offsets slows down the operation of PCRE because it has to load
       additional bytes when handling them.

       If you build PCRE with an increased link size, test 2 (and  test  5  if
       you  are using UTF-8) will fail. Part of the output of these tests is a
       representation of the compiled pattern, and this changes with the  link
       size.


AVOIDING EXCESSIVE STACK USAGE


       When matching with the pcre_exec() function, PCRE implements backtrack-
       ing by making recursive calls to an internal function  called  match().
       In  environments  where  the size of the stack is limited, this can se-
       verely limit PCRE's operation. (The Unix environment does  not  usually
       suffer  from  this  problem.)  An alternative approach that uses memory
       from the heap to remember data, instead  of  using  recursive  function
       calls,  has been implemented to work round this problem. If you want to
       build a version of PCRE that works this way, add

         --disable-stack-for-recursion

       to the configure command. With this configuration, PCRE  will  use  the
       pcre_stack_malloc  and pcre_stack_free variables to call memory manage-
       ment functions. Separate functions are provided because  the  usage  is
       very  predictable:  the  block sizes requested are always the same, and
       the blocks are always freed in reverse order. A calling  program  might
       be  able  to implement optimized functions that perform better than the
       standard malloc() and  free()  functions.  PCRE  runs  noticeably  more
       slowly when built in this way. This option affects only the pcre_exec()
       function; it is not relevant for the the pcre_dfa_exec() function.


USING EBCDIC CODE


       PCRE assumes by default that it will run in an  environment  where  the
       character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
       PCRE can, however, be compiled to  run  in  an  EBCDIC  environment  by
       adding

         --enable-ebcdic

       to the configure command.

Last updated: 15 August 2005
Copyright (c) 1997-2005 University of Cambridge.

                                                                  PCREBUILD(3)

Man(1) output converted with man2html