Been experimenting with the idea of using OTS (Open Text Summaries https://packages.debian.org/jessie/libots0) to help build Google-like summaries for body text. Thought I'd share the results. $string1 in the very rough code below is the text from the Drupal.org "About" page, and $string2 is what OTS throws back as a summary for a 40% ratio. The function get_longest_common_subsequence() is from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_c...

Output of the code below is

Build something amazing, for anyone

Drupal is content management software .... Drupal has great standard features, like easy content authoring, reliable performance, and excellent security .... Its tools help you build the versatile, structured content that dynamic web experiences need .... Modules expand Drupal's functionality .... Distributions are packaged Drupal bundles you can use as starter-kits. Mix and match these components to enhance Drupal's core abilities. Or, integrate Drupal with external services and other applications in your infrastructure. No other content management software is this powerful and scalable .... Drupal will always be free ...

I thought that was actually quite interesting.

$string1 = "The digital experiences you love. The organizations you trust most. The software they depend on.
Build something amazing, for anyone

Drupal is content management software. It's used to make many of the websites and applications you use every day. Drupal has great standard features, like easy content authoring, reliable performance, and excellent security. But what sets it apart is its flexibility; modularity is one of its core principles. Its tools help you build the versatile, structured content that dynamic web experiences need.

It's also a great choice for creating integrated digital frameworks. You can extend it with any one, or many, of thousands of add-ons. Modules expand Drupal's functionality. Themes let you customize your content's presentation. Distributions are packaged Drupal bundles you can use as starter-kits. Mix and match these components to enhance Drupal's core abilities. Or, integrate Drupal with external services and other applications in your infrastructure. No other content management software is this powerful and scalable.

The Drupal project is open source software. Anyone can download, use, work on, and share it with others. It's built on principles like collaboration, globalism, and innovation. It's distributed under the terms of the GNU General Public License (GPL). There are no licensing fees, ever. Drupal will always be free.";

$string2 = "Build something amazing, for anyone

Drupal is content management software. Drupal has great standard features, like easy content authoring, reliable performance, and excellent security. Its tools help you build the versatile, structured content that dynamic web experiences need. Modules expand Drupal's functionality. Distributions are packaged Drupal bundles you can use as starter-kits. Mix and match these components to enhance Drupal's core abilities. Or, integrate Drupal with external services and other applications in your infrastructure. No other content management software is this powerful and scalable. Drupal will always be free.";

$matches = array();

for ($i = 0; $i < strlen($string2); $i++) {
  $match = '';
  $match = get_longest_common_subsequence($string1, $string2);
  if (!$match) {
    break;
  }
  $string2 = str_replace($match, '', $string2);
  $match = preg_replace( "/^[^A-Za-z]+/", '', $match);
  $match = preg_replace('/[^a-z0-9]+\Z/i', '', $match);
  if ($match) {
    $matches[] = $match;
  }
}

foreach($matches as $item) {
  $ordered[strpos($string1, $item)] = $item;
}
ksort($ordered);

dpm(implode(' .... ', $ordered) . ' ...');

Comments

lightsurge created an issue.