Splitting this off from #372743: Body and teaser as fields

I did some benchmarking and profiling of a front page with 10 identical nodes before and after body and teaser as field.

Benchmarks came out 40% slower with field API.

Profiling shows at at least 30% of the 40% isn't just some extra cache_get() for the field cache and check_markup() on both body and summary but is down to several hundred additional function calls from converting a single field.

This is a major concern, since I've had plans for multiple getting of the field cache, and caching formatted text at either the field or object level to save database hits, whereas a good portion of the slowdown is in PHP, and on initial look it seems like at least some will be per field instance rather than per object.

A good portion of the extra time spent is in the following:

40 additional calls to theme()
30 additional calls to drupal_render()
100 additional calls to drupal_function_exists()
80 additional calls to check_plain()

That seems a lot for 10 fieldable objects with one field each.

I've attached screenshots from webgrind.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yched’s picture

Very interesting, thanks for doing this, catch !

So per node that's:
4 additional calls to theme()
3 additional calls to drupal_render()
10 additional calls to drupal_function_exists()
8 additional calls to check_plain()

Off-hand before going to bed :
- I don't explain the 4 theme() calls. There's one theme call per field value (formatters are theme functions), plus one for the field wrapper, so that's 2. Dunno where the other 2 come from right now.
- 3 calls to drupal_render() match the 3 level deep structure of the render array for a field, as documented in the PHPdoc of field_default_view().
- drupal_render() does 2 calls to drupal_function_exists(), and theme() does one, so we get
(2 x 3 drupal_render)+ 4 theme = 10 calls to drupal_function_exists
- I don't explain the 8 check_plain() calls.

Looking at the 'total inclusive cost' column, it looks like only theme() and drupal_render() overheads are significant, though.

catch’s picture

Some of the check_plain() are due to template_preprocess_field() - presumably on both the body and summary labels which then don't get rendered. That explains 20 of the additional check_plain(), didn't look into the rest yet.

You'll need the actual files and either webgrind or kcachegrind to be able to view which functions are calling which - and they're around 1MB each. I think that means I could get away with uploading them to this issue, but that'll have to wait until tomorrow.

catch’s picture

And here they are.

catch’s picture

Looks like I didn't actually post the benchmarks here:

/node - 40% slower
HEAD:
10.50 requests / second

Body and teaser as field:
7.40 requests / second
node/1 - 5% slower

HEAD:
14.69 / 14.74 requests/second

Body and teaser as field:
13.83 / 14.01 requests/second

I was hoping that some of this would be a flat cost simply from invoking Field API, but it looks like most of the hit is per object.
This and the profiling suggests that we're actually going to see a slowdown of between 4-5% per field, per fieldable object. i.e. we'll see function calls * n for the worst-performing functions, for every field added to an object (theme(), drupal_render(), template_preprocess_field() etc.)

yched’s picture

The profiling info you posted in #3 show a total processing time for /node page with 10 nodes (1 body field per node) of 28 ms (HEAD) vs 35 ms (Body as field - BAF). That would be a 33% slowdown for BAF compared to HEAD, not 40% ?

Here's what I could find so far :

Total overhead : 7ms
1.2 ms for field_attach_load() on nodes
0.1 ms for field_attach_load() on users (no fields on users)
5.5 ms for template_preprocess_node()
+ bits here and there, I didn't really track the remaining .3ms

Focusing on the render part for now :

- template_preprocess_node accounts for 5.5ms :
average .75 ms in HEAD
average 1.3 ms in BAF
.55 ms x 10 calls = 5.5 ms

- of this, drupal_render($node->content) overhead is 2 ms
not measurable (less than .1ms) in HEAD
average .2ms in BAF
Almost 100% of this time is spent in template_preprocess_field.
Of this, rendering the values and calling formatters in drupal_render_children() is about half the time (1ms)
Oddly enough, the summary generation, which is done at display time in BAF (as opposed to node save in HEAD) costs .2ms for one of the nodes, and is not measurable for the others.

Couldn't find where the other 3.5 ms for the rendering part are for now.

I don't clearly see room for optimizations in there for now...

Maybe I should profile this on my machine, which is probably slower, to get more significant times. Tracking how 7ms / 10 break down when less than .1 ms is not measurable is tricky.

catch’s picture

FileSize
1.11 MB

Hmm, thought I posted a followup but looks like it never made it.

The cachegrind is one request with the overhead of profiling. The benchmarks were 500 requests with no profiling - so +/- 7% seems likely given the difference.

On summary - I had trouble running the update so manually created some nodes with summary field populated - here's a straight upgrade of the 10 nodes from the HEAD site.

catch’s picture

Also, in cachegrind or webgrind you should be able to display microseconds instead of milliseconds - which if you want to track down details may help. For before after comparisons, set xdebug to append contents of cachegrind files and then hit refresh 10 times when profiling (and rename the files afterwards before doing the next batch of course) - then you get aggregated function calls across ten requests which'll show variation more clearly.

moshe weitzman’s picture

subscribe.

yched’s picture

For the record, I've been spending some time with this. Rather painful, because kCacheGrind offers no easy way (at least none that I know) to track one specific instance of functions like drupal_render, that are used all around a page request. For instance : isolate from the profile logs the 10 executions of drupal_render($nope->content) across 10 node displays. Plus the highly recursive nature of drupal_render seems to confuse kCacheGrind's visualisations...
On top of that, the profile logs I generate on my Windows setup don't seem to be parsed correctly by both webcachegrind or kcachegrind, so I'm reduced to using wincachegrind, which has its own set of restrictions.

In short, the overhead mainly seems to be the cost of moving from a plain and simple

$node->content['body'] = array(
  '#markup' => $teaser ? $node->teaser : $node->body,
  '#weight' => 0,
);

to a nested render structure, themed through a template. The template discovery itself (with 4 template variants...), performed on each themed field, costs quite a few cycles. I'll try to hack a quick-n-dirty version with simplified field rendering so that we can compare the benchmarks and check that this is indeed where we should be looking at.

catch’s picture

node_render() and template have been shown to be expensive before, so it seems likely they're the culprits here too. Nice work looking into this!

bjaspan’s picture

yched’s picture

I'm not set up nor used to run benchmarks, so I'm hoping catch will be willing to help on this :-)

I just rerolled the BAF patch in #372743-141: Body and teaser as fields against latest HEAD.
Attached are 3 patches, each equal to BAF + 1 "quick'n'dirty" simplification in the render process, corresponding to the most outstanding overhead costs I could isolate in the profiling logs provided by catch. I kept them separate so that we can evaluate how much we gain from each.

- 'opt1' : replace a probably unneeded drupal_render_children() by a direct theme call to the formatter when outputting individual field values. Probably makes sense anyway, this drupal_render_children() call was introduced during a hurry hotfix when drupal_render() refactoring was committed - I doubt this will buy us much, though.
- 'opt2': do not look for template suggestions in addition to field.tpl.php. Loses theming flexibility.
- 'opt3': use a theme_field() function instead of a field.tpl.php template. Loses theming flexibility + friendly theme override.

@catch:
Is it possible you bench "/node page - 10 promoted story nodes" on :
- HEAD (would be best to re-bench latest HEAD if we want fair comparisons, I guess ?)
- latest BAF patch in #372743-141: Body and teaser as fields
- each of the 3 patches below

yched’s picture

FileSize
3.1 KB

Finally managed to come up with benchmarks with semi-reasonable standard deviations.
Here are the results I came up with (my setup is obviously much slower than catch's...)

/node - 10 nodes displayed
HEAD : 2.99 req/s - 334ms/request
BAF: 2.26 req/s - 441ms/request - 32% slower
BAF opt 1: 2.48 req/s - 403ms/request - 20% slower
BAF opt 2: 2.64 req/s - 378ms/request - 13% slower
BAF opt 3: 2.70 req/s - 369ms/request - 10% slower

I've seen some pretty large differences between several runs of the benchmarks, though, so we could use a confirmation by someone else for 'BAF opt 2 vs. HEAD' and 'BAF opt 3 vs. HEAD'.

Full benchmark logs attached.

catch’s picture

Getting failed hunks in node.module for all the patches...

yched’s picture

Yes, all patches got broken after yesterday's http://drupal.org/cvs?commit=206232. Keeping the BAF patch to date for more tha 24hrs is a challenge :-(.
At first look, the changes involved in this case are more than a simple reroll. Can we agree on using a version of HEAD at the date the patch were rolled, for now ?

stewsnooze’s picture

Uploading some more raw xdebug "grinds" from today's HEAD.

Feel free to run them through any cachegrinds such as kCacheGrind, WinCache Grind or MacCallGrind

field_attach_load() is still over 21% of the time spent during the node_load_multiple

stewsnooze’s picture

FileSize
773.26 KB

I fell at the last post!

yched’s picture

Thks stewsnooze.
I'm not too worried about field_attach_load(). node_load_multiple() accounts for 6% of the whole request, so 20% of 6% is 1.5% for field_attach_load(). Not the biggest issue we have, and should be largely mitigated by #333171: Optimize the field cache and #362024: Field Attach: Make hook_field_load() be 'multiple' like field_attach_load().

For the rendering part, which seems to account for most of the 'field' overhead, we might want to move away from field templates (high cost in terms of file discovery, esp. on listing pages with several fields on each node), and use dynamic theming instead : function theme_field__field_name__bundle()
Would let us keep the theming flexibility allowed by template suggestions (by field, by bundle, by field+bundle...), and function discovery is cached in the theme registry, unlike template discovery. The HTML is complex enough to deserve a template, but it seems we cannot really afford it performance-wise...

I'd still welcome a confirmation of 'opt-3 VS HEAD' benchmarks, to see where exactly this would put us in terms of 'field API overhead', and if there are other places to look into.

catch’s picture

Status: Active » Needs work
Issue tags: +Performance

The bulk of field_attach_load() is spent in cache_get() - which should hopefully reduce significantly once #333171: Optimize the field cache is attached since it'll hit the database 1/10th as much.

I haven't had a chance to run benchmarks on the various patches from yched in #12 yet and might not have time this week. Bumping to CNW since at least option #1 looks like it's both an improvement and not a regression.

moshe weitzman’s picture

Status: Needs work » Active
Issue tags: -Performance

That would be pretty disappointing to not use our template system because we are worried about its overhead. We are using it for blocks and those quite frequently used. Should we ditch block.tpl.php as well? I'm circling merlinofchaos in here - maybe he has some ideas.

Maybe in field_theme we should declare many more entries. Today, we just have a 'field' entry. We should perhaps add one for each instance. That would move more work to the theme registry as the number of suggestions would be reduced. No idea if suggestions is really the problem. At the time we are theming something, we pass an array like theme(array(hook1, hook2, ...)). Passing an array to theme() is commonly used by Views but not many others.

It is a real problem that we have no UI in core for creating fields. How are developers supposed to learn and improve field api without a working UI? Reading tests and running bare PHP is a poor option. cck.module was last updated 5 weeks ago, and does not work anymore. And obviously keeping body as field patch up to date is a hard solution too. Besides manpower, are there significant obstacles to moving the field ui to core? Point me to another issue if we should not discuss that here. I could not find one.

yched’s picture

Status: Active » Needs work
Issue tags: +Performance

Crosspost ? resetting status and tags.

- True about blocks, but fields are likely to be much more numerous than blocks on listing pages ?
- Template suggestions are especially costly since for each rendered field, up to 4 (and this worst case is the most general one) files are searched for - ok, same number of suggestions for block templates... The benchmarks I ran seem to hint that field template discovery accounts for a 20% slowdown for 10 nodes with one 'body' field each. Cost increases when we move more stuff to 'as field', or when nodes have more user-added fields...
- the dynamic theming should be transparent : we set '#theme' => array(...) instead of '#theme' => 'field' in the render_structure built in field_default_view(). It's mainly used by Views in D6 land, but it's still part of the official core theme API, right ?
- I'm not sure I see how adding more 'field' entries to the theme registry would help, if we want one field.tpl.php template to catch them all by default.

Note that I'm not sold on this myself, I'm just trying to see what our options are...

Field UI : Developers should be able to learn and improve field api using CCK HEAD. If it's broken let's fix it :-) I'll try to look into it, can you be more specific as to what doesn't work, and open an issue for CCK HEAD ?
Other than that, yep, 'Field UI in core' is definitely not for this thread. There is no existing thread because the official position so far was 'UI remains in contrib', but we can open a new thread if it appears this won't fly.

stewsnooze’s picture

Status: Needs work » Active

re #20 I thought I was going insane looking to add fields to content types. We should get something trivial committed asap.

stewsnooze’s picture

Status: Active » Needs work
bjaspan’s picture

@stewsnooze: Please scan http://api.drupal.org/api/group/field/7. Then, whatever you want to be able to do that isn't clear to you, please post on http://jaspan.com/field-api-what-would-you-know. Thanks!

moshe weitzman’s picture

I'm pretty well along with a small test module which auto-creates fields and test data upon install. Then there is a menu callback which displays all the entities, but omits theme('page'). I think this will help us isolate the performance issues. I've also added timers throughout.

I got frustrated with webgrind. I think we'll still need webgrind and friends, but at least the data set will be cleaner.

Will post here later this weekend.

chx’s picture

Cant we simply cache the theme discovery...?

bjaspan’s picture

@chx: I thought about that for a bit too. The reason I concluded we can't is that the theme lookup might be based on object id (e.g. node-17.tpl.php). On a page full of O objects each with N distinct fields, we will still need O*N theme lookups, and none of them will be reused, so caching won't help.

yched’s picture

Maybe what chx was suggesting is that the cache registry holds the list of all template files actually found. File discovery happens at registry build time, and at theme() runtime we just pick the right one according to the actual object being themed. This involves a series of if (isset($templates[$variant_filename])) rather than a series of file_exists().

That's assuming you won't have 1000s of templates for node-[nid].tpl.php pattern.
I might completely miss some other obvious reasons why that approach wouldn't fly, though...

catch’s picture

If you've got thousands of node-[nid].tpl.php files then you've got bigger problems than a large theme registry. Sounds like it's worth trying.

moshe weitzman’s picture

That sounds like a good idea if file_exists() is really the problem. Most OS cache this information (not Windows?). But maybe it does not cache misses? Also, APC has an apc.stat pref which seems to force PHP to check for file existence every time. APC sites might get much better performance by flipping that to Off.

catch’s picture

Benchmarks of #12 vs. HEAD (as it was on 2nd May).

Couldn't get as much variation as yched but they don't look too far off.

HEAD:
7.14 / 7.28 / 7.15 [#/sec]

BAF:
5.51 / 5.46 / 5.46 [#/sec]

Option 1.
5.55 / 5.70 / 5.54 [#/sec]

Option 2.
5.53 / 5.66 / 5.62 [#/sec]

Option 3.
5.61 / 5.72 / 5.67 [#/sec]

yched’s picture

Many thanks catch.
This would mean that, unlike what my own benchmaks in #13 seemed to hint (but again, my setup seems quite poor at producing consistent benchmarks - bloated windows install, probably...), none of the optimization points that I thought I identified by going through the profiling logs brings us anywhere near...
I'm a bit at loss, here :-(. I'll sleep on that.

catch’s picture

FileSize
62.38 KB
67.5 KB

Also ran HEAD vs. baf through webgrind again.

As far as I can see, the vast majority of the extra work is happening in theme(). Problem is the call_user_func_array() in there makes it hard to track down why.

catch’s picture

So it's the 151 vs. 181 and 30 vs. 50 calls to call_user_func_array() inside theme() which interest me.

Need to go offline though so not going to look at which functions those are tonight.

catch’s picture

Took another look through webgrind and also some dpm() inside theme() to see what's getting called and how often.

I agree with yched that there's no obvious place for optimisation here - we have several function calls which aren't in HEAD, each of which can take 0ms-30ms on aggregate, and then template_preprocess_node() seems to account for most of the rest. There's no huge bottlenecks which can easily be identified. I don't think we can get any more out of profiling the body as field patch - if moshe's module could maybe give us 10 nodes each with 20 fields on, that'd be a more extreme example to see where the distribution of time is.

By the way I don't think we should postpone baf on this issue (which seems to be what's happening), it's just good to know what the extra flexibility costs us, and since it looks like we'll get the same cost from adding any field onto a node, we should be working on this issue independently from actual conversion of core fields.

catch’s picture

Also another plug for #369011: Performance: do check_markup() in text_field_load() which along with #333171: Optimize the field cache saves us 29 database queries on 10 nodes when applied against HEAD + baf. While it'd only be an optimisation for sites using database caching it claws back close to 50% of the performance hit in those cases.

alexanderpas’s picture

@#30
http://nl3.php.net/manual/en/function.clearstatcache.php

[...] note that PHP doesn't cache information about non-existent files. So, if you call file_exists() on a file that doesn't exist, it will return FALSE until you create the file. If you create the file, it will return TRUE even if you then delete the file. [...]

so, effectively... we need to build our own cache.

moshe weitzman’s picture

right ... i think it is worthwhile to create a new wrapper function for file_exists() that keeps a static about template files exist and which don't. it can just be a static that persists for the duration of the page. lets replace the file_exists() calls during theme() with this new wrapper.

alexanderpas’s picture

how about requiring that each .tpl file must be declared in the .info file (or added with drupal_add_template())

this will also count for modules obviously ;)

catch’s picture

That'd break the 'copy this template into your directory and edit it' workflow we have now. Also looking at webgrind I don't think file_exists() is the issue we have at all.

moshe weitzman’s picture

FileSize
613 bytes
2.78 KB
192 bytes

here is the module i mentioned. to be honest, i'm not sure what it is :). hopefully it can help with benchmarking (see path=fp). the uninstall does not work 100%

moshe weitzman’s picture

seem like i killed all conversation here :)

yched’s picture

catch posted other numbers in #504666: Make comments fieldable (#55 / #56).

Benchmarks below, + profiling info as webgrind shots: HEAD / fieldable comments patch

A node with 300 comments, no fields enabled on comments.

HEAD:

Document Path:          /node/6
Document Length:        685616 bytes

Concurrency Level:      1
Time taken for tests:   755.784 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      343040500 bytes
HTML transferred:       342808000 bytes
Requests per second:    0.66 [#/sec] (mean)
Time per request:       1511.568 [ms] (mean)
Time per request:       1511.568 [ms] (mean, across all concurrent requests)
Transfer rate:          443.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  1380 1511  75.1   1495    1897
Waiting:     1368 1498  74.7   1482    1884
Total:       1380 1512  75.1   1495    1898

Patch:

Document Path:          /node/6
Document Length:        687124 bytes

Concurrency Level:      1
Time taken for tests:   1038.024 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      343794500 bytes
HTML transferred:       343562000 bytes
Requests per second:    0.48 [#/sec] (mean)
Time per request:       2076.047 [ms] (mean)
Time per request:       2076.047 [ms] (mean, across all concurrent requests)
Transfer rate:          323.44 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  1927 2076  92.1   2059    2394
Waiting:     1911 2059  91.6   2043    2374
Total:       1927 2076  92.1   2059    2394
yched’s picture

As with the 'Body as field' patch, profiling info always seem to point to the 'render' part:
profiled samples linked in #43 indicate a 115ms difference between HEAD and patch.
- of those 115ms, 100 are spent in drupal_render
- of those 100ms, 55 are spent in theme
not sure where the other 45 are spent, whether in an identifiable spot or scattered ms by ms...
- of those 55ms, 35 are spent in theme_render_template

The intriguing thing, though, is that in this case this happens with *no* fields on comments. So theoretically there's no additional part to render or theme in $comment->content, nor any additional template to invoke...
[edit: turns out there was a bug in the 'fieldable comments' patch. What got benchmark is actually more like 'HEAD, 300 comments with no field' vs. 'patch, 300 comments with one field'. The breakdown above probably still makes sense in terms of "cost of rendering a field", though.]

catch’s picture

FileSize
251.67 KB

Bit more data - here's a webgrind screenshot with 50 comments - with the comment body and subject as field patch applied.

catch’s picture

FileSize
305.75 KB

theme_render_template() includes don't look very happy.

yched’s picture

Thanks for the additional data, catch. As previously, this does clearly point to overhead in the rendering step.
The strange thing, though, is that the various optimization attempts I posted in #12 (roughly, a/ reduce the number of template variants to look for, and b/ drop the templates and go for a theme function), and the discussion that ensued between #18 and #30, were shot down as no improvements by the benchmarks results you provided in #31, leaving the discussion a bit at loss...

We didn't really make use of the test module Moshe posted in #41, though. Isolating the profile data pertaining to just field rendering is really difficult because drupal_render is used all over the place in a regular page render, and is highly recursive. That module should provide valuable help, since it generates a page who *only* renders field data.

(side note: the webgrind shots you posted have an odd black area that masks the overall page execution time, making it hard to compare with individual function costs)

yched’s picture

Moshe's module in #41 probably needs to be changed to not generate field names starting with a number, which I think Field API now forbids (we're creating variables and object property names out of field names)

catch’s picture

All numbers in there are percentrages, sorry about the black area though, I should look at my screenshots before I post them (same as damn patches).

yched’s picture

FileSize
1.76 KB

Updated and polished the 'field profile' module Moshe posted in #41.

On install, the module creates 50 fields (cardinality 4), and 50 entities with random values for those fields.
The menu callback for path '/fp' renders the 50 entities and dies (thus, no additional page theming)

50 * 50 = 2500 fields with 4 values each, that's a monster page that takes quite some time on my setup, the numbers can be reduced if needed using the defined constants at the top of fp.install.

It seems uninstall doesn't work fully well, don't run this on a db you care about...

yched’s picture

Taking another stab at the optimizations tried earlied
- Patch 1 removes the field template suggestions
- Patch 2 replaces the template by a theme function.

Sorry for not doing the actual profiling myself, previous attempts showed my setup sucks at generating consistent profile data :-/

yched’s picture

Giving this another thought.

OK, I know it's a bit late in the game. Let's start this anyway.

In CCK D6 we made fields output using render arrays.
In the process, formatters were turned into theme functions.
Back then, 'security should not be left at the mercy of theme overrides' made us introduce hook_field_sanitize(), which is where we run check_markup() etc.

That hook is awkward because it doesn't even know which formatter will run in the end. Maybe even a formatter that is implemented by a 3rd party module. So the hook randomly sanitizes what it sees fit.

Now that theme functions can have preprocessors too, looks like there's no need for hook_field_sanitize() anymore.
I started to go that way for #639466: hook_options_list() + XSS filtering.

But given the D7 performance concerns, I'm wondering whether that whole 'formatters as theme' thing is really needed.
Might make sense for spiffy field types like image, file, media, noderef, whatever. But for the most frequent fields like text or number ? Is this really worth a theme call :

function theme_field_formatter_text_default($variables) {
  $element = $variables['element'];
  return $element['#item']['safe'];
}

With body + title + body comment as field, performance cost is high.

What if formatters just returned a render array ? If you want your formatter to be themeable, put a #theme key in the render aray.
Meanwhile, mission critical formatters and field types can just return some #markup.

No need for hook_field_sanitize() in that approach either, BTW.

Could be worth benchmarking, I guess...

moshe weitzman’s picture

Seems completely reasonable to me to make themeing optional for a formatter.

Frando’s picture

Yes, formatters returning render array sounds completely logical to me. Then, simple formatters can just set #markup or use an existing type while more complex ones set #theme etc. More flexible as what we have atm and also possibly faster for simple text formatters.

yched’s picture

FileSize
10.1 KB

Here's a proof of concept patch
- changes field_default_view() to call the new hook_field_formatter(), expects a render array. Hacked in, ugly code for now, doesn't work for 'multiple formatters' yet (formatters that handle the display of all values - rare).
- updates text.module formatters so this can be benchmarked with title and body fields. Other field types will fail (display nothing).
- also updates test_field's default formatter so this can be used with the 'field_profile' module in #50.

extension is .txt, we don't want test bot to run this.

yched’s picture

FileSize
1.77 KB

Plus: updated version of field_profile.module for current HEAD.
reminder from #50:
"On install, the module creates 50 fields (cardinality 4), and 50 test entities with random values for those fields.
The menu callback for path '/fp' renders the 50 entities and dies (thus, no additional page theming)

It seems uninstall doesn't work fully well, don't run this on a db you care about..."

sun’s picture

yched’s picture

Status: Needs work » Needs review

Bump.
#55 (formatters as render arrays instead of formatters as theme) is a non-minor API change and the refactoring might take some time and effort to polish. The sooner we know if and to what extend the change is worth it, the more chances we have to get this ready and accepted by Drieschick

I can't help with benchmarks, unfortunately, because my lame setup already proved to suck big time at perf measures.

Test scenario:
1) 10 article nodes with just a a title and a body - devel's 'generate' feature should be just fine, just be sure not to add taxo terms to the created nodes.
2) disable page cache
3) view the frontpage once to fill in field cache (to benchmark the 'display' side more accurately)
4) benchmark 500 page views of the front page as an anonymous user
5) apply patch
6) devel 'clear cache'
7) repeat steps 3) and 4)

Status: Needs review » Needs work

The last submitted patch failed testing.

moshe weitzman’s picture

I followed the process yched outlines. The patch yields a 2.9% speedup in requests per second. Thats quite promising, and the patch is worth pursuing.

~/htd/prfr$ ab -c1 -n500 http://mm/prfr/index.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking mm (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Apache/2.2.14
Server Hostname:        mm
Server Port:            80

Document Path:          /prfr/index.php
Document Length:        19730 bytes

Concurrency Level:      1
Time taken for tests:   180.841 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      10095000 bytes
HTML transferred:       9865000 bytes
Requests per second:    2.76 [#/sec] (mean)
Time per request:       361.683 [ms] (mean)
Time per request:       361.683 [ms] (mean, across all concurrent requests)
Transfer rate:          54.51 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   349  362  22.5    354     563
Waiting:      168  341  23.4    333     536
Total:        349  362  22.5    354     563

Percentage of the requests served within a certain time (ms)
  50%    354
  66%    359
  75%    362
  80%    365
  90%    379
  95%    398
  98%    430
  99%    472
 100%    563 (longest request)
~/htd/prfr$ p http://drupal.org/files/issues/field_formatter_render.txt
--2009-11-28 09:41:38--  http://drupal.org/files/issues/field_formatter_render.txt
Resolving drupal.org (drupal.org)... 140.211.166.6
Connecting to drupal.org (drupal.org)|140.211.166.6|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10345 (10K) [text/plain]
Saving to: “tmp.patch”

100%[================================================================>] 10,345      40.6K/s   in 0.2s    

2009-11-28 09:41:38 (40.6 KB/s) - “tmp.patch” saved [10345/10345]

patching file modules/field/field.default.inc
patching file modules/field/field.module
patching file modules/field/modules/text/text.module
patching file modules/field/tests/field_test.field.inc
cvs diff: Diffing .
~/htd/prfr$ drush cache clear
Cache cleared.
~/htd/prfr$ ab -c1 -n500 http://mm/prfr/index.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking mm (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Apache/2.2.14
Server Hostname:        mm
Server Port:            80

Document Path:          /prfr/index.php
Document Length:        19460 bytes

Concurrency Level:      1
Time taken for tests:   175.790 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      9960000 bytes
HTML transferred:       9730000 bytes
Requests per second:    2.84 [#/sec] (mean)
Time per request:       351.580 [ms] (mean)
Time per request:       351.580 [ms] (mean, across all concurrent requests)
Transfer rate:          55.33 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   346  351   4.8    350     388
Waiting:      326  332   5.3    330     379
Total:        346  352   4.8    350     388

Percentage of the requests served within a certain time (ms)
  50%    350
  66%    351
  75%    352
  80%    353
  90%    358
  95%    360
  98%    364
  99%    376
 100%    388 (longest request)
yched’s picture

Thanks Moshe.

So, in short: 3% speedup for listing pages with 10 items and 1 field per item (node title is not rendered through field theming). Gains rise with more nodes per page or more fields per node, obviously.
API break involved:
- Formatters: no more theme_field_formatter_MY_FORMATTER(), new hook_field_formatter() returning a render array. Yay, makes formatters and widgets much more similar.
- Field-types: hook_field_sanitize() is gone. Each formatter can handle its own sanitation while building its render array, since this is no longer in the hands of the theme layer. Yay, that hook was a WTF to begin with.

So, green light from core committers needed before we actually do the job :-p

Dries’s picture

Green light. Let's proceed with this patch. The green light will switch to a red light 5 days from now, on December 10th though.

catch’s picture

Does this mean December 10th is the deadline for other possibly necessary API changes to fix critical performance issues, or just this one because it's fairly widespread? If it's the former, I'd have serious concerns, because I don't even think we've found some of the critical issues yet let alone solutions for them.

yched’s picture

Mh, not sure I'll have the bandwidth to tackle this within the next 5 days :-/. I'll see what I can come up with, though.

Dries’s picture

The December 10th deadline was specifically meant for this issue because it is introducing a number of API changes. We'll re-evaluate on the 10th if necessary.
If we discover other major performance issues, we'll consider those on a case per case basis. Don't worry, just yet.

catch’s picture

Thanks Dries, no blood pressure pills for me just yet then ;)

effulgentsia’s picture

And a use-case that may hinge on this: #538164: Comment body as field. One challenge on that issue is that different people are getting very different numbers (even as a percentage) of what the impact is. Do we have a "standard production server spec" that we can use to base performance decisions on?

yched’s picture

catch’s picture

effulgentsia, we don't have any such standard. Not directly related to server spec (or this issue), but I opened #638078: Automated performance testing for core.

yched’s picture

Assigned: Unassigned » yched

Dries left us til tomorrow to get #652834: Field formatters as render arrays straight.
I'm working on the last test failures, but I could use some help with code reviews and benchmarks...

Anonymous’s picture

subscribe

effulgentsia’s picture

I opened a separate issue to deal with theme('field') optimization: #659788: [meta issue] theme('field') is too slow, which is inhibiting its more widespread use

moshe weitzman’s picture

Status: Needs work » Closed (fixed)

I think this issue has resuled in several spinoff issues. I'm closing this one, since I think we aren't working on anything specific anymore. Feel free to reopen if I am wrong.