Perl5160delta (1) Linux Manual Page

NAME

perl5160delta – what is new for perl v5.16.0

DESCRIPTION

This document describes differences between the 5.14.0 release and the 5.16.0 release.

If you are upgrading from an earlier release such as 5.12.0, first read perl5140delta, which describes differences between 5.12.0 and 5.14.0.

Some bug fixes in this release have been backported to later releases of 5.14.x. Those are indicated with the 5.14.x version in parentheses.

Notice

With the release of Perl 5.16.0, the 5.12.x series of releases is now out of its support period. There may be future 5.12.x releases, but only in the event of a critical security issue. Users of Perl 5.12 or earlier should consider upgrading to a more recent release of Perl.

This policy is described in greater detail in perlpolicy.

Core Enhancements

use VERSION

As of this release, version declarations like "use v5.16" now disable all features before enabling the new feature bundle. This means that the following holds true:

    use 5.016;
    # only 5.16 features enabled here
    use 5.014;
    # only 5.14 features enabled here (not 5.16)

"use v5.12" and higher continue to enable strict, but explicit "use strict" and "no strict" now override the version declaration, even when they come first:

    no strict;
    use 5.012;
    # no strict here

There is a new “:default” feature bundle that represents the set of features enabled before any version declaration or "use feature" has been seen. Version declarations below 5.10 now enable the “:default” feature set. This does not actually change the behavior of "use v5.8", because features added to the “:default” set are those that were traditionally enabled by default, before they could be turned off.

"no feature" now resets to the default feature set. To disable all features (which is likely to be a pretty special-purpose request, since it presumably won’t match any named set of semantics) you can now write "no feature ':all'".

$[ is now disabled under "use v5.16". It is part of the default feature set and can be turned on or off explicitly with "use feature 'array_base'".

SUB

The new "__SUB__" token, available under the "current_sub" feature (see feature) or "use v5.16", returns a reference to the current subroutine, making it easier to write recursive closures.

New and Improved Built-ins

More consistent "eval"

The "eval" operator sometimes treats a string argument as a sequence of characters and sometimes as a sequence of bytes, depending on the internal encoding. The internal encoding is not supposed to make any difference, but there is code that relies on this inconsistency.

The new "unicode_eval" and "evalbytes" features (enabled under "use 5.16.0") resolve this. The "unicode_eval" feature causes "eval $string" to treat the string always as Unicode. The "evalbytes" features provides a function, itself called "evalbytes", which evaluates its argument always as a string of bytes.

These features also fix oddities with source filters leaking to outer dynamic scopes.

See feature for more detail.

"substr" lvalue revamp

When "substr" is called in lvalue or potential lvalue context with two or three arguments, a special lvalue scalar is returned that modifies the original string (the first argument) when assigned to.

Previously, the offsets (the second and third arguments) passed to "substr" would be converted immediately to match the string, negative offsets being translated to positive and offsets beyond the end of the string being truncated.

Now, the offsets are recorded without modification in the special lvalue scalar that is returned, and the original string is not even looked at by "substr" itself, but only when the returned lvalue is read or modified.

These changes result in an incompatible change:

If the original string changes length after the call to "substr" but before assignment to its return value, negative offsets will remember their position from the end of the string, affecting code like this:


my $string = "string";
my $lvalue = \substr $string, -4, 2;
print $$lvalue, "
";
#prints "ri" $string = "bailing twine";
print $$lvalue, "
";
#prints "wi";
used to print "il"

The same thing happens with an omitted third argument. The returned lvalue will always extend to the end of the string, even if the string becomes longer.

Since this change also allowed many bugs to be fixed (see "The "substr" operator"), and since the behavior of negative offsets has never been specified, the change was deemed acceptable.

Return value of "tied"

The value returned by "tied" on a tied variable is now the actual scalar that holds the object to which the variable is tied. This lets ties be weakened with "Scalar::Util::weaken(tied $tied_variable)".

Unicode Support

Supports (almost) Unicode 6.1

Besides the addition of whole new scripts, and new characters in existing scripts, this new version of Unicode, as always, makes some changes to existing characters. One change that may trip up some applications is that the General Category of two characters in the Latin-1 range, PILCROW SIGN and SECTION SIGN, has been changed from Other_Symbol to Other_Punctuation. The same change has been made for a character in each of Tibetan, Ethiopic, and Aegean. The code points U+3248..U+324F (CIRCLED NUMBER TEN ON BLACK SQUARE through CIRCLED NUMBER EIGHTY ON BLACK SQUARE) have had their General Category changed from Other_Symbol to Other_Numeric. The Line Break property has changes for Hebrew and Japanese; and because of other changes in 6.1, the Perl regular expression construct "\X" now works differently for some characters in Thai and Lao.

New aliases (synonyms) have been defined for many property values; these, along with the previously existing ones, are all cross-indexed in perluniprops.

The return value of "charnames::viacode()" is affected by other changes:

 Code point      Old Name             New Name
   U+000A    LINE FEED (LF)        LINE FEED
   U+000C    FORM FEED (FF)        FORM FEED
   U+000D    CARRIAGE RETURN (CR)  CARRIAGE RETURN
   U+0085    NEXT LINE (NEL)       NEXT LINE
   U+008E    SINGLE-SHIFT 2        SINGLE-SHIFT-2
   U+008F    SINGLE-SHIFT 3        SINGLE-SHIFT-3
   U+0091    PRIVATE USE 1         PRIVATE USE-1
   U+0092    PRIVATE USE 2         PRIVATE USE-2
   U+2118    SCRIPT CAPITAL P      WEIERSTRASS ELLIPTIC FUNCTION

Perl will accept any of these names as input, but "charnames::viacode()" now returns the new name of each pair. The change for U+2118 is considered by Unicode to be a correction, that is the original name was a mistake (but again, it will remain forever valid to use it to refer to U+2118). But most of these changes are the fallout of the mistake Unicode 6.0 made in naming a character used in Japanese cell phones to be “BELL”, which conflicts with the longstanding industry use of (and Unicode’s recommendation to use) that name to mean the ASCII control character at U+0007. Therefore, that name has been deprecated in Perl since v5.14, and any use of it will raise a warning message (unless turned off). The name “ALERT” is now the preferred name for this code point, with “BEL” an acceptable short form. The name for the new cell phone character, at code point U+1F514, remains undefined in this version of Perl (hence we don’t implement quite all of Unicode 6.1), but starting in v5.18, BELL will mean this character, and not U+0007.

Unicode has taken steps to make sure that this sort of mistake does not happen again. The Standard now includes all generally accepted names and abbreviations for control characters, whereas previously it didn’t (though there were recommended names for most of them, which Perl used). This means that most of those recommended names are now officially in the Standard. Unicode did not recommend names for the four code points listed above between U+008E and U+008F, and in standardizing them Unicode subtly changed the names that Perl had previously given them, by replacing the final blank in each name by a hyphen. Unicode also officially accepts names that Perl had deprecated, such as FILE SEPARATOR. Now the only deprecated name is BELL. Finally, Perl now uses the new official names instead of the old (now considered obsolete) names for the first four code points in the list above (the ones which have the parentheses in them).

Now that the names have been placed in the Unicode standard, these kinds of changes should not happen again, though corrections, such as to U+2118, are still possible.

Unicode also added some name abbreviations, which Perl now accepts: SP for SPACE; TAB for CHARACTER TABULATION; NEW LINE, END OF LINE, NL, and EOL for LINE FEED; LOCKING-SHIFT ONE for SHIFT OUT; LOCKING-SHIFT ZERO for SHIFT IN; and ZWNBSP for ZERO WIDTH NO-BREAK SPACE.

More details on this version of Unicode are provided in <http://www.unicode.org/versions/Unicode6.1.0/>.

"use charnames" is no longer needed for "\N{name}"

When "\N{name}" is encountered, the "charnames" module is now automatically loaded when needed as if the ":full" and ":short" options had been specified. See charnames for more information.

"\N{…}" can now have Unicode loose name matching

This is described in the "charnames" item in “Updated Modules and Pragmata” below.

Unicode Symbol Names

Perl now has proper support for Unicode in symbol names. It used to be that "*{$foo}" would ignore the internal UTF8 flag and use the bytes of the underlying representation to look up the symbol. That meant that "*{"

perl5160delta (1) Linux Manual Page