perl5140delta (1) Linux Manual Page
NAME
perl5140delta – what is new for perl v5.14.0
DESCRIPTION
This document describes differences between the 5.12.0 release and the 5.14.0 release.
If you are upgrading from an earlier release such as 5.10.0, first read perl5120delta, which describes differences between 5.10.0 and 5.12.0.
Some of the bug fixes in this release have been backported to subsequent releases of 5.12.x. Those are indicated with the 5.12.x version in parentheses.
Notice
As described in perlpolicy, the release of Perl 5.14.0 marks the official end of support for Perl 5.10. Users of Perl 5.10 or earlier should consider upgrading to a more recent release of Perl.
Core Enhancements
Unicode
Unicode Version 6.0 is now supported (mostly)
Perl comes with the Unicode 6.0 data base updated with Corrigendum #8 <http://www.unicode.org/versions/corrigendum8.html>, with one exception noted below. See <http://unicode.org/versions/Unicode6.0.0/> for details on the new release. Perl does not support any Unicode provisional properties, including the new ones for this release.
Unicode 6.0 has chosen to use the name "BELL" for the character at U+1F514, which is a symbol that looks like a bell, and is used in Japanese cell phones. This conflicts with the long-standing Perl usage of having "BELL" mean the ASCII "BEL" character, U+0007. In Perl 5.14, "\N{BELL}" continues to mean U+0007, but its use generates a deprecation warning message unless such warnings are turned off. The new name for U+0007 in Perl is "ALERT", which corresponds nicely with the existing shorthand sequence for it, "". "\N{BEL}" means U+0007, with no warning given. The character at U+1F514 has no name in 5.14, but can be referred to by "\N{U+1F514}". In Perl 5.16, "\N{BELL}" will refer to U+1F514; all code that uses "\N{BELL}" should be converted to use "\N{ALERT}", "\N{BEL}", or "" before upgrading.
Full functionality for "use feature ‘unicode_strings’"
This release provides full functionality for "use feature 'unicode_strings'". Under its scope, all string operations executed and regular expressions compiled (even if executed outside its scope) have Unicode semantics. See “the ‘unicode_strings’ feature” in feature. However, see “Inverted bracketed character classes and multi-character folds”, below.
This feature avoids most forms of the “Unicode Bug” (see “The ”Unicode Bug"" in perlunicode for details). If there is any possibility that your code will process Unicode strings, you are strongly encouraged to use this subpragma to avoid nasty surprises.
"\N{NAME}" and "charnames" enhancements
- •
- "\N{NAME}" and "charnames::vianame" now know about the abbreviated character names listed by Unicode, such as NBSP, SHY, LRO, ZWJ, etc.; all customary abbreviations for the C0 and C1 control characters (such as ACK, BEL, CAN, etc.); and a few new variants of some C1 full names that are in common usage.
- •
- Unicode has several named character sequences, in which particular sequences of code points are given names. "\N{NAME}" now recognizes these.
- •
- "\N{NAME}", "charnames::vianame", and "charnames::viacode" now know about every character in Unicode. In earlier releases of Perl, they didn’t know about the Hangul syllables nor several CJK (Chinese/Japanese/Korean) characters.
- •
- It is now possible to override Perl’s abbreviations with your own custom aliases.
- •
- You can now create a custom alias of the ordinal of a character, known by "\N{NAME}", "charnames::vianame()", and "charnames::viacode()". Previously, aliases had to be to official Unicode character names. This made it impossible to create an alias for unnamed code points, such as those reserved for private use.
- •
- The new function
charnames::string_vianame()is a run-time version of "\N{NAME}}", returning the string of characters whose Unicode name is its parameter. It can handle Unicode named character sequences, whereas the pre-existingcharnames::vianame()cannot, as the latter returns a single code point.
See charnames for details on all these changes.
New warnings categories for problematic (non-)Unicode code points.
Three new warnings subcategories of “utf8” have been added. These allow you to turn off some “utf8” warnings, while allowing other warnings to remain on. The three categories are: "surrogate" when UTF-16 surrogates are encountered; "nonchar" when Unicode non-character code points are encountered; and "non_unicode" when code points above the legal Unicode maximum of 0x10FFFF are encountered.
Any unsigned value can be encoded as a character
With this release, Perl is adopting a model that any unsigned value can be treated as a code point and encoded internally (as utf8) without warnings, not just the code points that are legal in Unicode. However, unless utf8 or the corresponding sub-category (see previous item) of lexical warnings have been explicitly turned off, outputting or executing a Unicode-defined operation such as upper-casing on such a code point generates a warning. Attempting to input these using strict rules (such as with the ":encoding(UTF-8)" layer) will continue to fail. Prior to this release, handling was inconsistent and in places, incorrect.
Unicode non-characters, some of which previously were erroneously considered illegal in places by Perl, contrary to the Unicode Standard, are now always legal internally. Inputting or outputting them works the same as with the non-legal Unicode code points, because the Unicode Standard says they are (only) illegal for “open interchange”.
Unicode database files not installed
The Unicode database files are no longer installed with Perl. This doesn’t affect any functionality in Perl and saves significant disk space. If you need these files, you can download them from <http://www.unicode.org/Public/zipped/6.0.0/>.
Regular Expressions
"(?^…)" construct signifies default modifiers
An ASCII caret "^" immediately following a "(?" in a regular expression now means that the subexpression does not inherit surrounding modifiers such as "/i", but reverts to the Perl defaults. Any modifiers following the caret override the defaults.
Stringification of regular expressions now uses this notation. For example, "qr/hlagh/i" would previously be stringified as "(?i-xsm:hlagh)", but now it’s stringified as "(?^i:hlagh)".
The main purpose of this change is to allow tests that rely on the stringification not to have to change whenever new modifiers are added. See “Extended Patterns” in perlre.
This change is likely to break code that compares stringified regular expressions with fixed strings containing "?-xism".
"/d", "/l", "/u", and "/a" modifiers
Four new regular expression modifiers have been added. These are mutually exclusive: one only can be turned on at a time.
- •
- The "/l" modifier says to compile the regular expression as if it were in the scope of "use locale", even if it is not.
- •
- The "/u" modifier says to compile the regular expression as if it were in the scope of a "use feature 'unicode_strings'" pragma.
- •
- The "/d" (default) modifier is used to override any "use locale" and "use feature 'unicode_strings'" pragmas in effect at the time of compiling the regular expression.
- •
- The "/a" regular expression modifier restricts "\s", "\d" and "\w" and the POSIX ("[[:posix:]]") character classes to the ASCII range. Their complements and "" and "\B" are correspondingly affected. Otherwise, "/a" behaves like the "/u" modifier, in that case-insensitive matching uses Unicode semantics.
If the "/a" modifier is repeated, then additionally in case-insensitive matching, no ASCII character can match a non-ASCII character. For example,
"k" =~ /\N{KELVIN SIGN}/ai "�" =~ /ss/aimatch but
"k" =~ /\N{KELVIN SIGN}/aai "�" =~ /ss/aaido not match.
See “Modifiers” in perlre for more detail.
Non-destructive substitution
The substitution ("s///") and transliteration ("y///") operators now support an "/r" option that copies the input variable, carries out the substitution on the copy, and returns the result. The original remains unmodified.
my $old = "cat"; my $new = $old =~ s/cat/dog/r; # $old is "cat" and $new is "dog"
This is particularly useful with "map". See perlop for more examples.
Re-entrant regular expression engine
It is now safe to use regular expressions within "(?{...})" and "(??{...})" code blocks inside regular expressions.
These blocks are still experimental, however, and still have problems with lexical ("my") variables and abnormal exiting.
"use re ‘/flags’"
The "re" pragma now has the ability to turn on regular expression flags till the end of the lexical scope:
use re "/x";
"foo" =~ / (.+) /; # /x implied
See “’/flags’ mode” in re for details.
\o{…} for octals
There is a new octal escape sequence, "\o", in doublequote-like contexts. This construct allows large octal ordinals beyond the current max of 0777 to be represented. It also allows you to specify a character in octal which can safely be concatenated with other regex snippets and which won’t be confused with being a backreference to a regex capture group. See “Capture groups” in perlre.
Add "\p{Titlecase}" as a synonym for "\p{Title}"
This synonym is added for symmetry with the Unicode property names "\p{Uppercase}" and "\p{Lowercase}".
Regular expression debugging output improvement
Regular expression debugging output (turned on by "use re 'debug'") now uses hexadecimal when escaping non-ASCII characters, instead of octal.
Return value of "delete $+{…}"
Custom regular expression engines can now determine the return value of "delete" on an entry of "%+" or "%-".
Syntactical Enhancements
Array and hash container functions accept references
Warning: This feature is considered experimental, as the exact behaviour may change in a future version of Perl.
All builtin functions that operate directly on array or hash containers now also accept unblessed hard references to arrays or hashes:
| -- -- -- -- -- -- -- -- -- -- -- -- -- --+-- -- -- -- -- -- -- -- -- -- -- -- -- - |
| Traditional syntax | Terse syntax |
| -- -- -- -- -- -- -- -- -- -- -- -- -- --+-- -- -- -- -- -- -- -- -- -- -- -- -- - |
| push @$arrayref,
@stuff | push $arrayref, @stuff | | unshift @$arrayref, @stuff | unshift $arrayref, @stuff | | pop @$arrayref | pop $arrayref | | shift @$arrayref | shift $arrayref | | splice @$arrayref, 0, 2 | splice $arrayref, 0, 2 | | keys % $hashref | keys $hashref | | keys @$arrayref | keys $arrayref | | values % $hashref | values $hashref | | values @$arrayref | values $arrayref | | ($k, $v) = each % $hashref | ($k, $v) = each $hashref | | ($k, $v) = each @$arrayref | ($k, $v) = each $arrayref | | -- -- -- -- -- -- -- -- -- -- -- -- -- --+-- -- -- -- -- -- -- -- -- -- -- -- -- - |
This allows these builtin functions to act on long dereferencing chains or on the return value of subroutines without needing to wrap them in "@{}" or "%{}":
push @{$obj->tags}, $new_tag;
#old way
push $obj->tags,
$new_tag;
#new way for (keys % {$hoh->{genres} {artists}})
{
…
}
#old way for (keys $hoh->{genres} {artists})
{
…
}
#new way
Single term prototype
The "+" prototype is a special alternative to "$" that acts like "\[@%]" when given a literal array or hash variable, but will otherwise force scalar context on the argument. See “Prototypes” in perlsub.
"package" block syntax
A package declaration can now contain a code block, in which case the declaration is in scope inside that block only. So "package Foo { ... }" is precisely equivalent to "{ package Foo; ... }". It also works with a version number in the declaration, as in "package Foo 1.2 { ... }", which is its most attractive feature. See perlfunc.
Statement labels can appear in more places
Statement labels can now occur before any type of statement or declaration, such as "package".
Stacked labels
Multiple statement labels can now appear before a single statement.
Uppercase X/B allowed in hexadecimal/binary literals
Literals may now use either upper case "0X..." or "0B..." prefixes, in addition to the already supported "0x..." and "0b..." syntax [perl #76296].
C, Ruby, Python, and PHP already support this syntax, and it makes Perl more internally consistent: a round-trip with "eval sprintf "%#X", 0x10" now returns 16, just like "eval sprintf "%#x", 0x10".
Overridable tie functions
"tie", "tied" and "untie" can now be overridden [perl #75902].
Exception Handling
To make them more reliable and consistent, several changes have been made to how "die", "warn", and $@ behave.
- •
- When an exception is thrown inside an "eval", the exception is no longer at risk of being clobbered by destructor code running during unwinding. Previously, the exception was written into $@ early in the throwing process, and would be overwritten if "eval" was used internally in the destructor for an object that had to be freed while exiting from the outer "eval". Now the exception is written into $@ last thing before exiting the outer "eval", so the code running immediately thereafter can rely on the value in $@ correctly corresponding to that "eval". ($@ is still also set before exiting the "eval", for the sake of destructors that rely on this.)
Likewise, a "local $@" inside an "eval" no longer clobbers any exception thrown in its scope. Previously, the restoration of $@ upon unwinding would overwrite any exception being thrown. Now the exception gets to the "eval" anyway. So "local $@" is safe before a "die".
Exceptions thrown from object destructors no longer modify the $@ of the surrounding context. (If the surrounding context was exception unwinding, this used to be another way to clobber the exception being thrown.) Previously such an exception was sometimes emitted as a warning, and then either was string-appended to the surrounding $@ or completely replaced the surrounding $@, depending on whether that exception and the surrounding $@ were strings or objects. Now, an exception in this situation is always emitted as a warning, leaving the surrounding $@ untouched. In addition to object destructors, this also affects any function call run by XS code using the "G_KEEPERR" flag.
- •
- Warnings for "warn" can now be objects in the same way as exceptions for "die". If an object-based warning gets the default handling of writing to standard error, it is stringified as before with the filename and line number appended. But a $SIG{__WARN__} handler now receives an object-based warning as an object, where previously it was passed the result of stringifying the object.
Other Enhancements
Assignment to $0 sets the legacy process name with prctl() on Linux
On Linux the legacy process name is now set with prctl(2), in addition to altering the POSIX name via "argv[0]", as Perl has done since version 4.000. Now system utilities that read the legacy process name such as ps, top, and killall recognize the name you set when assigning to $0. The string you supply is truncated at 16 bytes; this limitation is imposed by Linux.
srand() now returns the seed
This allows programs that need to have repeatable results not to have to come up with their own seed-generating mechanism. Instead, they can use srand() and stash the return value for future use. One example is a test program with too many combinations to test comprehensively in the time available for each run. It can test a random subset each time and, should there be a failure, log the seed used for that run so this can later be used to produce the same results.
printf-like functions understand post-1980 size modifiers
Perl’s printf and sprintf operators, and Perl’s internal printf replacement function, now understand the C90 size modifiers “hh” ("char"), “z” ("size_t"), and “t” ("ptrdiff_t"). Also, when compiled with a C99 compiler, Perl now understands the size modifier “j” ("intmax_t") (but this is not portable).
So, for example, on any modern machine, "sprintf("%hhd", 257)" returns “1”.
New global variable "${^GLOBAL_PHASE}"
A new global variable, "${^GLOBAL_PHASE}", has been added to allow introspection of the current phase of the Perl interpreter. It’s explained in detail in “${^GLOBAL_PHASE}” in perlvar and in “BEGIN, UNITCHECK, CHECK, INIT and END” in perlmod.
"-d:-foo" calls "Devel::foo::unimport"
The syntax -d:foo was extended in 5.6.1 to make -d:foo=bar equivalent to -MDevel::foo=bar, which expands internally to "use Devel::foo 'bar'". Perl now allows prefixing the module name with –, with the same semantics as -M; that is:
- "-d:-foo"
- Equivalent to
-M-Devel::foo: expands to "no Devel::foo" and calls "Devel::foo->unimport()" if that method exists. - "-d:-foo=bar"
- Equivalent to
-M-Devel::foo=bar: expands to "no Devel::foo 'bar'", and calls "Devel::foo->unimport("bar")" if that method exists.
This is particularly useful for suppressing the default actions of a "Devel::*" module’s "import" method whilst still loading it for debugging.
Filehandle method calls load IO::File on demand
When a method call on a filehandle would die because the method cannot be resolved and IO::File has not been loaded, Perl now loads IO::File via "require" and attempts method resolution again:
open my $fh, “>”, $file;
$fh->binmode(“:raw”);
#loads IO::File and succeeds
This also works for globs like "STDOUT", "STDERR", and "STDIN":
STDOUT->autoflush(1);
Because this on-demand load happens only if method resolution fails, the legacy approach of manually loading an IO::File parent class for partial method support still works as expected:
use IO::Handle;
open my $fh, “>”, $file;
$fh->autoflush(1);
#IO::File not loaded
Improved IPv6 support
The "Socket" module provides new affordances for IPv6, including implementations of the "Socket::getaddrinfo()" and "Socket::getnameinfo()" functions, along with related constants and a handful of new functions. See Socket.
DTrace probes now include package name
The "DTrace" probes now include an additional argument, "arg3", which contains the package the subroutine being entered or left was compiled in.
For example, using the following DTrace script:
perl$target:: : sub – entry
{
printf(“%s::%s
”, copyinstr(arg0), copyinstr(arg3));
}
and then running:
$ perl -e 'sub test { }; test'
"DTrace" will print:
main::test
New C APIs
See “Internal Changes”.
Security
User-defined regular expression properties
“User-Defined Character Properties” in perlunicode documented that you can create custom properties by defining subroutines whose names begin with “In” or “Is”. However, Perl did not actually enforce that naming restriction, so "\p{foo::bar}" could call foo::bar() if it existed. The documented convention is now enforced.
Also, Perl no longer allows tainted regular expressions to invoke a user-defined property. It simply dies instead [perl #82616].
Incompatible Changes
Perl 5.14.0 is not binary-compatible with any previous stable release.
In addition to the sections that follow, see “C API Changes”.
Regular Expressions and String Escapes
Inverted bracketed character classes and multi-character folds
Some characters match a sequence of two or three characters in "/i" regular expression matching under Unicode rules. One example is "LATIN SMALL LETTER SHARP S" which matches the sequence "ss".
‘ss’ = ~ /\A[\N{LATIN SMALL LETTER SHARP S}]\z / i #Matches
This, however, can lead to very counter-intuitive results, especially when inverted. Because of this, Perl 5.14 does not use multi-character "/i" matching in inverted character classes.
‘ss’ = ~ /\A[^\N{
LATIN SMALL LETTER SHARP S}] +\z / i #
? ? ?
This should match any sequences of characters that aren’t the "SHARP S" nor what "SHARP S" matches under "/i". "s" isn’t "SHARP S", but Unicode says that "ss" is what "SHARP S" matches under "/i". So which one “wins”? Do you fail the match because the string has "ss" or accept it because it has an "s" followed by another "s"?
Earlier releases of Perl did allow this multi-character matching, but due to bugs, it mostly did not work.
\400-\777
In certain circumstances, "\400"–"\777" in regexes have behaved differently than they behave in all other doublequote-like contexts. Since 5.10.1, Perl has issued a deprecation warning when this happens. Now, these literals behave the same in all doublequote-like contexts, namely to be equivalent to "
