Version 1.3 of Portable UTF-8 library has been released. This version extends the functionality by adding 20 string handling and utility functions to the api. There are a few bug fixes and lots of optimizations and improvements.
You can download the the library here and the complete list and explanations of functions is available here.
New Functions
utf8_str_replace
– UTF-8 aware replace all occurrences of a string with another string.utf8_str_repeat
– Repeat a UTF-8 encoded string.utf8_str_pad
– Pad a UTF-8 string to given length with another string.utf8_strrpos
– Find position of last occurrence of a char in a UTF-8 string.utf8_remove_duplicates
– Removes duplicate occurrences of a string in another string.utf8_ws
– Returns an array of Unicode White Space characters.utf8_trim_util
(for internal use – Prepares a string and given chars for trim operations.utf8_trim
– Strip whitespace or other characters from both ends of a UTF-8 string.utf8_ltrim
– Strip whitespace or other characters from beginning of a UTF-8 string.utf8_rtrim
– Strip whitespace or other characters from end of a UTF-8 string.utf8_strtolower
– Make a UTF-8 string Lower Case.utf8_strtoupper
– Make a UTF-8 string Upper Case.utf8_case_table
– Returns an array of all lower and upper case UTF-8 encoded characters.utf8_ucfirst
– Makes string’s first char Uppercaseutf8_lcfirst
– Makes string’s first char Lowercaseutf8_ucwords
– Uppercase the first character of each word in a stringutf8_stripos
– Find position of first occurrence of a case-insensitive stringutf8_strripos
– Find position of last occurrence of a case-insensitive stringmbstring_loaded
– Checks whethermbstring
is available on the servericonv_loaded
– Checks whethericonv
is available on the server
Optimizations & Improvements
utf8_clean
now accepts a second parameterbool $remove_bom
to optionally remove Byte Order Marks from anywhere inside the string.utf8_url_slug
andutf8_chr
have some minor code improvements.utf8_strlen
,utf8_substr
andutf8_strpos
now usembstring
andiconv
if available. Previously these were ignored for their inconsistent behavior. This change also improves performance.utf8_rev
has been renamed asutf8_strrev
after the native string function
strrev
.utf8_unicode_style_to_int
has been renamed asutf8_hex_to_int
.utf8_int_to_unicode_style
has been renamed asutf8_int_to_hex
.utf8_chr_to_unicode_style
has been renamed asutf8_chr_to_hex
.utf8_int_to_hex
andutf8_chr_to_hex
have a new optional parameter to suggest the functions your preferred prefix (U+
or\u
or nothing) in the return string. By default these return code points in the patternU+xxxx
.utf8_hex_to_int
now accepts plain hexadecimal string and\uxxxx
representation of code points, in addition toU+xxxx
style strings.utf8_chr
now accepts hexadecimal code points as theutf8_hex_to_int
does. Integer code point must now be fed as strict int or it will be treated as hexadecimal string.is_utf8
now usesmb_check_encoding
ifmbstring
is available on the server.
Bug Fixes
utf8_split
used to return an array with one empty element if an empty string is passed as argument to it. Bug fixed.utf8_url_slug
usedstrtolower
. Corrected to useutf8_strtolower
.utf8_clean
has been improved to have new regex syntax. The old regex caused Connection Reset on large UTF-8 strings.