Portable UTF-8 v 1.3 released

Version 1.3 of Portable UTF-8 library has been released. This version extends the functionality by adding 20 string handling and utility functions to the api. There are a few bug fixes and lots of optimizations and improvements.

You can download the the library here and the complete list and explanations of functions is available here.

New Functions

  • utf8_str_replace – UTF-8 aware replace all occurrences of a string with another string.
  • utf8_str_repeat – Repeat a UTF-8 encoded string.
  • utf8_str_pad – Pad a UTF-8 string to given length with another string.
  • utf8_strrpos – Find position of last occurrence of a char in a UTF-8 string.
  • utf8_remove_duplicates – Removes duplicate occurrences of a string in another string.
  • utf8_ws – Returns an array of Unicode White Space characters.
  • utf8_trim_util (for internal use – Prepares a string and given chars for trim operations.
  • utf8_trim – Strip whitespace or other characters from both ends of a UTF-8 string.
  • utf8_ltrim – Strip whitespace or other characters from beginning of a UTF-8 string.
  • utf8_rtrim – Strip whitespace or other characters from end of a UTF-8 string.
  • utf8_strtolower – Make a UTF-8 string Lower Case.
  • utf8_strtoupper – Make a UTF-8 string Upper Case.
  • utf8_case_table – Returns an array of all lower and upper case UTF-8 encoded characters.
  • utf8_ucfirst – Makes string’s first char Uppercase
  • utf8_lcfirst – Makes string’s first char Lowercase
  • utf8_ucwords – Uppercase the first character of each word in a string
  • utf8_stripos – Find position of first occurrence of a case-insensitive string
  • utf8_strripos – Find position of last occurrence of a case-insensitive string
  • mbstring_loaded – Checks whether mbstring is available on the server
  • iconv_loaded – Checks whether iconv is available on the server

Optimizations & Improvements

  • utf8_clean now accepts a second parameter bool $remove_bom to optionally remove Byte Order Marks from anywhere inside the string.
  • utf8_url_slug and utf8_chr have some minor code improvements.
  • utf8_strlen, utf8_substr and utf8_strpos now use mbstring and iconv if available. Previously these were ignored for their inconsistent behavior. This change also improves performance.
  • utf8_rev has been renamed as utf8_strrev after the native string function
  • utf8_unicode_style_to_int has been renamed as utf8_hex_to_int.
  • utf8_int_to_unicode_style has been renamed as utf8_int_to_hex.
  • utf8_chr_to_unicode_style has been renamed as utf8_chr_to_hex.
  • utf8_int_to_hex and utf8_chr_to_hex have a new optional parameter to suggest the functions your preferred prefix (U+ or \u or nothing) in the return string. By default these return code points in the pattern U+xxxx.
  • utf8_hex_to_int now accepts plain hexadecimal string and \uxxxx representation of code points, in addition to U+xxxx style strings.
  • utf8_chr now accepts hexadecimal code points as the utf8_hex_to_int does. Integer code point must now be fed as strict int or it will be treated as hexadecimal string.
  • is_utf8 now uses mb_check_encoding if mbstring is available on the server.

Bug Fixes

  • utf8_split used to return an array with one empty element if an empty string is passed as argument to it. Bug fixed.
  • utf8_url_slug used strtolower. Corrected to use utf8_strtolower.
  • utf8_clean has been improved to have new regex syntax. The old regex caused Connection Reset on large UTF-8 strings.