perlguts (1) Linux Manual Page
NAME
perlguts – Introduction to the Perl API
DESCRIPTION
This document attempts to describe how to use the Perl API, as well as to provide some info on the basic workings of the Perl core. It is far from complete and probably contains many errors. Please refer any questions or comments to the author below.
Variables
Datatypes
Perl has three typedefs that handle Perl’s three main data types:
SV Scalar Value
AV Array Value
HV Hash Value
Each typedef has specific routines that manipulate the various data types.
What is an IV?
Perl uses a special typedef IV which is a simple signed integer type that is guaranteed to be large enough to hold a pointer (as well as an integer). Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses two special typedefs, I32 and I16, which will always be at least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, as well.) They will usually be exactly 32 and 16 bits long, but on Crays they will both be 64 bits.
Working with SVs
An SV can be created and loaded with one command. There are five types of values that can be loaded: an integer value (IV), an unsigned integer value (UV), a double (NV), a string (PV), and another scalar (SV). (“PV” stands for “Pointer Value”. You might think that it is misnamed because it is described as pointing only to strings. However, it is possible to have it point to other things. For example, it could point to an array of UVs. But, using it for non-strings requires care, as the underlying assumption of much of the internals is that PVs are just for strings. Often, for example, a trailing "NUL" is tacked on automatically. The non-string use is documented only in this paragraph.)
The seven routines are:
SV *newSViv(IV);
SV *newSVuv(UV);
SV *newSVnv(double);
SV *newSVpv(const char *, STRLEN);
SV *newSVpvn(const char *, STRLEN);
SV *newSVpvf(const char *, ...);
SV *newSVsv(SV *);
"STRLEN" is an integer type ("Size_t", usually defined as "size_t" in config.h) guaranteed to be large enough to represent the size of any string that perl can handle.
In the unlikely case of a SV requiring more complex initialization, you can create an empty SV with newSV(len). If "len" is 0 an empty SV of type NULL is returned, else an SV of type PV is returned with len + 1 (for the "NUL") bytes of storage allocated, accessible via SvPVX. In both cases the SV has the undef value.
SV *sv = newSV(0); /* no storage allocated */
SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
* allocated */
To change the value of an already-existing SV, there are eight routines:
void sv_setiv(SV *, IV);
void sv_setuv(SV *, UV);
void sv_setnv(SV *, double);
void sv_setpv(SV *, const char *);
void sv_setpvn(SV *, const char *, STRLEN) void sv_setpvf(SV *, const char *, …);
void sv_vsetpvfn(SV *, const char *, STRLEN, va_list *,
SV **, Size_t, bool *);
void sv_setsv(SV *, SV *);
Notice that you can choose to specify the length of the string to be assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may allow Perl to calculate the length by using "sv_setpv" or by specifying 0 as the second argument to "newSVpv". Be warned, though, that Perl will determine the string’s length by using "strlen", which depends on the string terminating with a "NUL" character, and not otherwise containing NULs.
The arguments of "sv_setpvf" are processed like "sprintf", and the formatted output becomes the value.
"sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to specify either a pointer to a variable argument list or the address and length of an array of SVs. The last argument points to a boolean; on return, if that boolean is true, then locale-specific information has been used to format the string, and the string’s contents are therefore untrustworthy (see perlsec). This pointer may be NULL if that information is not important. Note that this function requires you to specify the length of the format.
The "sv_set*()" functions are not generic enough to operate on values that have “magic”. See “Magic Virtual Tables” later in this document.
All SVs that contain strings should be terminated with a "NUL" character. If it is not "NUL"-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a "NUL"-terminated string. Perl’s own functions typically add a trailing "NUL" for this reason. Nevertheless, you should be very careful when you pass a string stored in an SV to a C function or system call.
To access the actual value that an SV points to, you can use the macros:
SvIV(SV*)
SvUV(SV*)
SvNV(SV*)
SvPV(SV*, STRLEN len)
SvPV_nolen(SV*)
which will automatically coerce the actual scalar type into an IV, UV, double, or string.
In the "SvPV" macro, the length of the string returned is placed into the variable "len" (this is a macro, so you do not use &len). If you do not care what the length of the data is, use the "SvPV_nolen" macro. Historically the "SvPV" macro with the global variable "PL_na" has been used in this case. But that can be quite inefficient because "PL_na" must be accessed in thread-local storage in threaded Perl. In any case, remember that Perl allows arbitrary strings of data that may both contain NULs and might not be terminated by a "NUL".
Also remember that C doesn’t allow you to safely say "foo(SvPV(s, len), len);". It might work with your compiler, but it won’t work for everyone. Break this sort of statement up into separate assignments:
SV *s;
STRLEN len;
char *ptr;
ptr = SvPV(s, len);
foo(ptr, len);
If you want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to force Perl to allocate more memory for your SV, you can use the macro
SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it will call the function "sv_grow". Note that "SvGROW" can only increase, not decrease, the allocated memory of an SV and that it does not automatically add space for the trailing "NUL" byte (perl’s own string functions typically do "SvGROW(sv, len + 1)").
If you want to write to an existing SV‘s buffer and set its value to a string, use SvPV_force() or one of its variants to force the SV to be a PV. This will remove any of various types of non-stringness from the SV while preserving the content of the SV in the PV. This can be used, for example, to append data from an API function to a buffer without extra copying:
(void)SvPVbyte_force(sv, len);
s = SvGROW(sv, len + needlen + 1);
/* something that modifies up to needlen bytes at s+len, but
modifies newlen bytes
eg. newlen = read(fd, s + len, needlen);
ignoring errors for these examples
*/
s[len + newlen] = ‘
