To broaden the scope of Freelinking, we need a method by which plugins can specify the syntax used to activate them. To smoothly scale up to supporting many different syntax structures, and allow them to vary by plugin, there's some groundwork to put in place.

  1. The basic architecture of freelinking_filter() needs to build a complete regular expression to match all instances of a given plugin with a single preg_match_all(). The indicator will not come along in a secondary stage. The indicator will no longer include pattern modifiers or expression delimiters. This has the minor affect of dropping the order of links in the text.
  2. Support for a single default plugin when multiple "bracket matching" schemes are possible is unnecessarily limiting. Instead, Plugin Weights should be used, giving precedence to the lowest-weight plugin using a given bracket-matching scheme. This creates implicit default plugins for every type of syntax, and also results in automatic failover to "higher weight" plugins in the event a given plugin fails with a no-effect result.
  3. The current "Default Plugin" setting should be used to create a magical weight override that pushes the selected plugin to the top of the stack. It should be optional and play nice with #634348: Configuration by Input Format.
  4. The current "Syntax Selection" setting should be used to create a default in the event a plugin does not care to define a syntax. This should also play nice with #634348: Configuration by Input Format.

What is bracket-matching syntax?

[[link]], [link], #link, @link, link@plugin, [Link Text](link), and so on.

Comments

arhak’s picture

welcome delimiters!!

but be careful, IMO you'll have (should prefer) to deal with open/closing delimiters
otherwise having just an opening delimiter (@link) would match against what ending?
EOL (end of line), non-word chars? spaces? tabs?

if you start with couple of delimiters you'd support many wiki-style syntaxes
then you might want to move on against opening-EOL or SOL-closing (Start Of Line)

but you shouldn't try to automatically recognize what is word or non-word, because it might become a tricky path

arhak’s picture

also, a case like [Link Text](link) is even wider that the others

what would that be? an opening square bracket against a closing parenthesis?
here your hitting the edge of a more powerful wiki-syntax tool than the current scope of what FL3 is aiming

maybe for FL4?

Grayside’s picture

[Link Text](link) is Markdown-style, and basic support for that was folded into Alpha3 as a global option. If testing proves it's too complex, it may be dropped before a full 3.0 release.

I had some vague thought that prefix-only delimiters would terminate with the first non-escaped whitespace character (So #This\ is\ fine), but I am still contemplating how to build a flexible system, rather than the particulars of how to implement a given regular expression.

arhak’s picture

"prefix-only", "whitespace character"

what would a prefix-only be in a right-to-left language?
is every language ok with the definition of "whitespace character"?

Grayside’s picture

Absolutely, there are definitely considerations in doing this. Before I try to address your points I want to continue working through the basic approach of how plugins will specify their matching scheme, as that will inform how we approach the complexities of multiple languages.

There are two basic approaches to granting plugins the ability to define their own matching scheme. They are not mutually exclusive:

  1. Plugins can specify the use of a bracketing scheme defined by Freelinking. (I.e., 'match' => 'double bracket')
  2. Bracketing schemes get their own definition specification and plugins may define whatever they want. Perhaps something like:
    'match' => array(
      'expression' => '/\b#(.+)@(.+):(.+)\b/Uu', 
      'indicator' => 2,
      'target' => 1, 
      'arguments' => 3,
      'argument separator' => '|',
      'example' => '#target@indicator:text|tooltip|arg1'
    )
    
arhak’s picture

ungreedy \b#(.+?)@(.+?):(.+)\b
or reserved delimiter \b#([^@]+)@([^:]+):(.+)\b

Grayside’s picture

Thanks for the tweaks, still has the problems of #4. The purpose of my post was to explore the match array structure. Is that complete enough?

In Option #1, we have the problem of pushing the plugin's indicator into the expression. We no longer want wildcard indicators unless we want a universal fallback syntax ([[plugin:target]] always works, but the plugin specifies #target). That means we need to define indicators in general-use match specifications that allows indicators to be stripped out or replaced with something plugin-specific.

array(
  'indicator' => 'plugin',
  'match' => array(
    'expression' => '\b#(.+?)INDICATOR:(.+)\b',
    'indicator prefix' => '@',
    'indicator suffix' => '',
    'target' => 1,
    'arguments' => 2,
  )
);

"\b#(.+?)INDICATOR:(.+)\b" becomes "\b#(.+?)@plugin:(.+)\b"

arhak’s picture

Is that complete enough?

the array structure of #5 looks good, legible enough, wide enough
(nevertheless there will be always cases out of its scope)

#7 becomes pretty awful/unreadable
expression looks almost like a regex but INDICATOR will be replaced by indicator prefix/suffix ...
NO please, it will be madness (IMO)

arhak’s picture

arguments & argument separator are a good idea/approach
but, for instead, image/video filters might use arguments in more than one position
and have complex separators like

size=640x480
size=80%

I mention this, just to point out that some image/video filters can provide use cases to test whether you're being flexible enough

BUT I don't think you should aim to cover them all

Grayside’s picture

There is a sort of unified structure for arguments. A routine will parse out the arguments into something like $target['size'] = 80%.

I agree that the INDICATOR token thing is sloppy. Here is what we are discussing now:

Option 1

Much like it currently works, you may specify an indicator and it will use that to match against the global "default" in bracket matching. This will function as it currently does.
'indicator' => 'nt|nodetitle|title'

Option 2

The array structure from #5, and you may make it as specific or vague as you want.

arhak’s picture

right now I'm in fence...

it should be a developers call, what would the majority of developers prefer

option 1 seems very straightforward
while option 2 seems more powerful/flexible

maybe starting with straightforward until plugin system become so popular than a wider API gets more required

Grayside’s picture

It seems to me the options are not mutually exclusive. In fact, if you define both an 'indicator' and a 'match' both could be used.

gisle’s picture

Version: 6.x-3.x-dev » 7.x-3.x-dev
Status: Active » Closed (won't fix)

Five years ago, Grayside wrote:

The basic architecture of freelinking_filter() needs to build a complete regular expression to match all instances of a given plugin with a single preg_match_all().

While this provides excellent flexibility, and markdown and single bracket are already partially implemented for version 3, this approach is not without disadvantages:

  • It is hard to provide consistent parsing throughout the project for all the possible syntaxes. There are already some rather ugly hacks in place to support the markdown style markup and there need to be a lot more to make the implementation of markdown complete.
  • It is hard do document all the syntactical quirks that this allows. Indeed, only the double square bracket syntax has been documented. The way markdown is supposed to work within this module can only be found out by reading code.
  • The preg_match_all() approach breaks when replacing markup where the same target is linked with two different title attributes. For example, given: [[nid:1|foo]] [[nid:1|bar]] , the "match all" will result in anchor text (title) of the second link being set to "foo" instead of "bar" as the user would expect.

Because of these disadvantages, I am planning to pull back from this approach. The upcoming Freelinking 7.x-3.4 release will only support the double square bracket syntax.

  • gisle committed 05eb3c3 on 7.x-3.x
    #647940 by gisle: Changed parsing of freelinks to not use preg_match_all...