Version 1.3 of Portable UTF-8 library has been released. This version extends the functionality by adding 20 string handling and utility functions to the api. There are a few bug fixes and lots of optimizations and improvements.
You can download the the library here and the complete list and explanations of functions is available here.
New Functions
utf8_str_replace– UTF-8 aware replace all occurrences of a string with another string.utf8_str_repeat– Repeat a UTF-8 encoded string.utf8_str_pad– Pad a UTF-8 string to given length with another string.utf8_strrpos– Find position of last occurrence of a char in a UTF-8 string.utf8_remove_duplicates– Removes duplicate occurrences of a string in another string.utf8_ws– Returns an array of Unicode White Space characters.utf8_trim_util(for internal use – Prepares a string and given chars for trim operations.utf8_trim– Strip whitespace or other characters from both ends of a UTF-8 string.utf8_ltrim– Strip whitespace or other characters from beginning of a UTF-8 string.utf8_rtrim– Strip whitespace or other characters from end of a UTF-8 string.utf8_strtolower– Make a UTF-8 string Lower Case.utf8_strtoupper– Make a UTF-8 string Upper Case.utf8_case_table– Returns an array of all lower and upper case UTF-8 encoded characters.utf8_ucfirst– Makes string’s first char Uppercaseutf8_lcfirst– Makes string’s first char Lowercaseutf8_ucwords– Uppercase the first character of each word in a stringutf8_stripos– Find position of first occurrence of a case-insensitive stringutf8_strripos– Find position of last occurrence of a case-insensitive stringmbstring_loaded– Checks whethermbstringis available on the servericonv_loaded– Checks whethericonvis available on the server
Optimizations & Improvements
utf8_cleannow accepts a second parameterbool $remove_bomto optionally remove Byte Order Marks from anywhere inside the string.utf8_url_slugandutf8_chrhave some minor code improvements.utf8_strlen,utf8_substrandutf8_strposnow usembstringandiconvif available. Previously these were ignored for their inconsistent behavior. This change also improves performance.utf8_revhas been renamed asutf8_strrevafter the native string function
strrev.utf8_unicode_style_to_inthas been renamed asutf8_hex_to_int.utf8_int_to_unicode_stylehas been renamed asutf8_int_to_hex.utf8_chr_to_unicode_stylehas been renamed asutf8_chr_to_hex.utf8_int_to_hexandutf8_chr_to_hexhave a new optional parameter to suggest the functions your preferred prefix (U+or\uor nothing) in the return string. By default these return code points in the patternU+xxxx.utf8_hex_to_intnow accepts plain hexadecimal string and\uxxxxrepresentation of code points, in addition toU+xxxxstyle strings.utf8_chrnow accepts hexadecimal code points as theutf8_hex_to_intdoes. Integer code point must now be fed as strict int or it will be treated as hexadecimal string.is_utf8now usesmb_check_encodingifmbstringis available on the server.
Bug Fixes
utf8_splitused to return an array with one empty element if an empty string is passed as argument to it. Bug fixed.utf8_url_slugusedstrtolower. Corrected to useutf8_strtolower.utf8_cleanhas been improved to have new regex syntax. The old regex caused Connection Reset on large UTF-8 strings.