perl5110delta (1) Linux Manual Page
NAME
perl5110delta – what is new for perl v5.11.0
DESCRIPTION
This document describes differences between the 5.10.0 release and the 5.11.0 development release.
Incompatible Changes
Unicode interpretation of \w, \d, \s, and the POSIX character classes redefined.
Previous versions of Perl tried to map POSIX style character class definitions onto Unicode property names so that patterns would “dwim” when matches were made against latin-1 or unicode strings. This proved to be a mistake, breaking character class negation, causing forward compatibility problems (as Unicode keeps updating their property definitions and adding new characters), and other problems.
Therefore we have now defined a new set of artificial “unicode” property names which will be used to do unicode matching of patterns using POSIX style character classes and perl short-form escape character classes like \w and \d.
The key change here is that \d will no longer match every digit in the unicode standard (there are thousands) nor will \w match every word character in the standard, instead they will match precisely their POSIX or Perl definition.
Those needing to match based on Unicode properties can continue to do so by using the \p{} syntax to match whichever property they like, including the new artificial definitions.
NOTE: This is a backwards incompatible no-warning change in behaviour. If you are upgrading and you process large volumes of text look for POSIX and Perl style character classes and change them to the relevant property name (by removing the word ‘Posix’ from the current name).
The following table maps the POSIX character class names, the escapes and the old and new Unicode property mappings:
POSIX Esc Class New-Property ! Old-Property
----------------------------------------------+-------------
alnum [0-9A-Za-z] IsPosixAlnum ! IsAlnum
alpha [A-Za-z] IsPosixAlpha ! IsAlpha
ascii [
