This module adds the ability to format a float in a good looking fashion by doing an float => string formatting.

However, the module does not do the reverse, ie. convert a formatted number string to a float. If the user sees a number in a certain format he/she is also likely to specify the number in the same way.

CommentFileSizeAuthor
#6 format_number.test5.25 KBztyx
#6 number_format.add_testing.patch2.87 KBztyx

Comments

markus_petrux’s picture

Agreed.

Suggestions for the function name? unformat_number($string) ?

ztyx’s picture

What about parse_number($string), deformat_number($string) or parse_formatted_number($string)?

Also, I suggest this function do:

  1. Try to convert to number using user selected format.
  2. Try to convert to number using global format.
  3. Try to convert to number using plain cast.
  4. Returns FALSE if all else fails.

Comments?

markus_petrux’s picture

I like parse_formatted_number($string) :)

BTW, I have to commit a few changes in the module that will turn the site/user settings form into something based on Unicode Locale Data Markup Language (LDML). The textfields for thousands separator and decimal point can be turned into radios with a known set of symbols.

If we look at the table of possible symbols for decimal point and thousands separator, we have a limited set of combinations (ref. Charts by type: Number symbols). So maybe we could proceed as follows:

  • Trim input.
  • Sign can only be accepted at the beginning or at end of input. Otherwise, return FALSE.
  • Non-numeric symbols should be present in the table of allowed symbols. Otherwise, return FALSE.
  • Only 2 different non-numeric symbols can be accepted. Otherwise, return FALSE.
  • When 2 non-numeric symbols are present, the first one can only be a thousand separator, but it should be an allowed symbol for thousand separator. If a valid match is found, strip it out. Otherwise, return FALSE.
  • Now, we can only have only 1 non-numeric symbol, that should be a valid decimal point symbol and that should be present on input once (ie. 2 decimals points is not valid). Otherwise, return FALSE.
  • Finally, our parsed input can be checked with is_numeric(). Return the number if correct. Otherwise, return FALSE.

Thoughts?

markus_petrux’s picture

Oops! One question remains... what should be returned when input is an empty string?

Options:

1) Return FALSE. But it could be confused with invalid input.
2) Return empty string ''.
3) Add a second argument to parse_formatted_number() that can tell us if number is strictly required. So if 'strict' and input is empty string, return FALSE. If not 'strict' and input is empty string return 0.
4) ... ¿?

I would vote for option 3 :)

markus_petrux’s picture

Status: Active » Needs review

I have commited function parse_formatted_number() that should be available now from CVS, or from the next nightly snapshot.

The code looks like this:

/**
 * Parse a formatted number.
 *
 * @param string $formatted_number
 *   A number formatted with localized thousands separator and decimal point.
 * @param boolean $strictly_required
 *   If input is empty string, return FALSE when number is strictly required,
 *   otherwise an empty string is returned as 0.
 * @return number
 *   A valid PHP number.
 */
function parse_formatted_number($formatted_number, $strictly_required = TRUE) {
  static $format_options, $decimal_point_options, $thousands_separator_options;
  if (!isset($format_options)) {
    $format_options = format_number_get_options();
    $decimal_point_options = format_number_get_decimal_point_options();
    $thousands_separator_options = format_number_get_thousands_separator_options();
  }

  // Trim input.
  $formatted_number = trim($formatted_number);
  if ($formatted_number === '') {
    return ($strictly_required ? FALSE : 0);
  }

  // Extract the sign.
  $is_negative = FALSE;
  if ($formatted_number[0] == '-' || $formatted_number[0] == '+') {
    $is_negative = TRUE;
    $formatted_number = drupal_substr($formatted_number, 1);
  }
  else {
    $last_char = $formatted_number[drupal_strlen($formatted_number)-1];
    if ($last_char == '-' || $last_char == '+') {
      $is_negative = TRUE;
      $formatted_number = drupal_substr($formatted_number, 0, -1);
    }
  }

  // Extract non-numeric symbols.
  preg_match_all('#[^0-9]#u', $formatted_number, $matches);
  $non_numeric_symbols = array_count_values($matches[0]);
  $non_numeric_symbols_count = count($non_numeric_symbols); 
  if ($non_numeric_symbols_count > 2) {
    // More than two different non-numeric symbols.
    return FALSE;
  }

  // When 2 non-numeric symbols are present, the first one should be the
  // thousands separator, the second one should be decimal separator.
  if ($non_numeric_symbols_count == 2) {
    // Extract and validate thousands separator.
    $thousands_sep = array_keys($non_numeric_symbols);
    $thousands_sep = array_shift($thousands_sep);
    if (!isset($thousands_separator_options[$thousands_sep])) {
      // This is not a valid thousands separator symbol.
      return FALSE;
    }
    // Strip out thousands separators.
    $formatted_number = str_replace($thousands_sep, '', $formatted_number);

    // Extract and validate decimal point.
    unset($non_numeric_symbols[$thousands_sep]);
    $decimal_point = array_keys($non_numeric_symbols);
    $decimal_point = array_shift($decimal_point);
    if ($non_numeric_symbols[$decimal_point] > 1) {
      // Decimal point symbol is used more than once.
      return FALSE;
    }
    if (!isset($decimal_point_options[$decimal_point])) {
      // This is not a valid decimal point symbol.
      return FALSE;
    }
    // Convert decimal point into dot, if necessary.
    if ($decimal_point != '.') {
      $formatted_number = str_replace($decimal_point, '.', $formatted_number);
    }
  }

  // When only one non-numeric symbol is present, we need to decide if it is a
  // thousands separator or a decimal point.
  else if ($non_numeric_symbols_count == 1) {
    $unknown_symbol = array_keys($non_numeric_symbols);
    $unknown_symbol = array_shift($unknown_symbol);

    // When unknown symbol is used more than once, it can only be a
    // thousands separator, but it should be one allowed.
    if ($non_numeric_symbols[$unknown_symbol] > 1) {
      if (!isset($thousands_separator_options[$unknown_symbol])) {
        // This symbol is not a valid thousands separator.
        return FALSE;
      }
      // Strip out unknown symbol (aka. thousands separators in this case).
      $formatted_number = str_replace($unknown_symbol, '', $formatted_number);
    }

    // When unknown symbol is not a dot (.) or a comma (,)...
    else if ($unknown_symbol != '.' && $unknown_symbol != ',') {
      if (isset($decimal_point_options[$unknown_symbol])) {
        // This is a valid decimal point symbol.
        $formatted_number = str_replace($unknown_symbol, '.', $formatted_number);
      }
      if (isset($thousands_separator_options[$unknown_symbol])) {
        // This is a valid thousands separator symbol.
        $formatted_number = str_replace($unknown_symbol, '', $formatted_number);
      }
      else {
        // This is an invalid symbol.
        return FALSE;
      }
    }

    // Unknown symbol is used once and it is a dot (.) or a comma (,).
    // Use site/user settings to decide.
    else {
      if ($unknown_symbol == $format_options['decimal_point']) {
        // This is a valid decimal point symbol.
        $formatted_number = str_replace($unknown_symbol, '.', $formatted_number);
      }
      else if ($unknown_symbol == $format_options['thousands_sep']) {
        // This is a valid thousands separator symbol.
        $formatted_number = str_replace($unknown_symbol, '', $formatted_number);
      }
      else {
        // This is an invalid symbol.
        return FALSE;
      }
    }
  }

  return ($is_negative ? '-' : '') . $formatted_number;
}

Please, check it out and let me know if that covers what you had in mind. :)

ztyx’s picture

StatusFileSize
new2.87 KB
new5.25 KB

I like your code. I've attached a patch against current CVS changing a pair of things:

  1. Makes the code more safe in terms of static. You only checked one of the three static variables.
  2. Also, a few test cases for this wouldn't hurt. Therefor, I added a few parameters so that the settings could be attached to the calling function. This will add two more conditional cases per run. I know the function should be optimized for speed, but don't think these two conditional cases will slow things down too much.

I'm also attaching a sketch for tests. Have a look at them and see if you find them reasonable. I do. Currently 6 tests out of 51 fails.

markus_petrux’s picture

Status: Needs review » Fixed

If you would have to use format_number() passing explicit thsep/decpt options, you could simply use PHP's number_format(), so I wouldn't add the option to format_number().

Also, I wouldn't patch parse_formatted_number() just for testing purposes. If that's really needed, the test environment should be able to reproduce the variable conditions that a function may find. Maybe it can be created more than one SimpleTest file, each one able to predefine site/user settings.

SimpleTest is a great tool for QA, but we could better use a separate issue for that. AFAICT, the "Reverse number format" issue is fixed. :)

markus_petrux’s picture

  // Should never happen. If it does, it will break the module.
  assert($format_options['thousands_sep'] != $format_options['decimal_point']);

That's not really needed either, since settings form (both, site wide and user profile forms) implement the options with radios (impossible to select anything that is not predefined) and also provide a validation routine to ensure the same character is not selected for both symbols.

ztyx’s picture

SimpleTest is a great tool for QA, but we could better use a separate issue for that. AFAICT, the "Reverse number format" issue is fixed. :)

Well, I see SimpleTest as a great way of writing a specification too. And since six of the tests failed (all in the testing of parse_formatted_number()), the specification did not hold.

Anyway, I'll create a new issue for this after refactoring the code a bit further.

markus_petrux’s picture

Status: Fixed » Closed (fixed)