Closed (fixed)
Project:
Localization server
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Task
Assigned:
Reporter:
Created:
17 Feb 2009 at 18:42 UTC
Updated:
2 Oct 2009 at 18:50 UTC
Jump to comment: Most recent file
Damien Tournoud says that it should be straightforward to denormalize the data structure by pushing down the project id to release, file and line tables and the release id to file and line tables, since the file and line rows are unique to a release of a project and are not shared among the different releases of project or even different projects. That would allow us to join from the line table in regular queries and skip the release and file table (we already skip the project table).
| Comment | File | Size | Author |
|---|---|---|---|
| #21 | include-missed.patch | 829 bytes | gábor hojtsy |
| #20 | pid-rid-denormalization.patch | 43.54 KB | gábor hojtsy |
| #19 | pid-rid-denormalization.patch | 43.24 KB | gábor hojtsy |
| #18 | pid-rid-denormalize.patch | 17.23 KB | gábor hojtsy |
| #16 | l10n_denormalize_6.patch | 16.39 KB | meba |
Comments
Comment #1
meba commentedJust saving progress. Attached patch adds new columns and copies data from existing installations.
Working on inserting data during data creation and selects.
Comment #2
meba commentedOK, this is actually getting ready for review, Gabor :-)
Status: New columns in database, update is moving data to them, extractor is able to save strings and files with the data.
Moving to simplifying the join query
Comment #3
meba commentedOK, this seems to be done. Implemented stuff:
- Adding new fields
- Populating them with data either on update from old installation and both fresh installation and scanning packages
- Editing screen using new columns
Moving to next issue.
Comment #4
gábor hojtsyApplied the patch and did a quick review. I am determined to commit this as soon as possible to help you work on the other fixes. So here are the issues I found:
1. in the update code, you use comma for JOIN, while rest of Drupal always uses the proper spelled out JOIN to show what is happening better.
2. I did a grep on both changed tables to find queries related; came up with this list of queries which should take advantage of the new table structure as well; packaging or Ajax calls are just as much important to make perform better:
3. The cascade delete in l10n_community_delete_project() can now be made postresql compatible, since we have the pid on all rows, so we can just remove the items based on pid from all three tables in three DELETE's not just one cascade delete. That would be an easy and great way to jump forward. A good side-effect of this performance optimization.
Thanks for all your hard work.
Comment #5
meba commentedSee:
SELECT COUNT(DISTINCT t.sid) AS translation_count, r.pid, t.is_suggestion FROM {l10n_community_release} r INNER JOIN {l10n_community_file} f ON r.rid = f.rid INNER JOIN {l10n_community_line} l ON f.fid = l.fid INNER JOIN {l10n_community_string} s ON l.sid = s.sid LEFT JOIN {l10n_community_translation} t ON s.sid = t.sid WHERE t.is_active = 1 AND t.translation != ''Changed into:
Is there a reason to join from translation to string and then to line if we already have sid in translation so joining translation+line would be enough?
Comment #6
gábor hojtsyHuh, I don't see a reason to join with lines there.
Comment #7
meba commentedSorry for misunderstanding...
See:
Compared to:
This should give same results in my opinion but I found the previous version (with older joins too) in the code so I am trying to find a reason for that :)
Comment #8
meba commentedAttaching new patch.
How to test revised SQL queries:
- Go to Explore projects page, Explore languages page and Translate welcome screen
- Note all numbers displayed on each page
- Apply patch
- DELETE FROM cache;
- Reload all pages
All numbers should stay the same
Comment #9
meba commentedI would also like to do some benchmarks.
First impressions:
New query:
BUt second (new) seems to be much slower!
Comment #10
gábor hojtsyWell, we did not change any indexes keeping the new queries in mind!
Comment #11
killes@www.drop.org commentedThe reason why the second query is slower is likely that it looks at much more rows in the first table.
Maybe a better index helps.
Comment #12
killes@www.drop.org commentedActually, the best way to deal with this might be the materialized views approach that David has taken in tracker2, ie we'd keep a table with precalculated values.
Comment #13
meba commentedWell, should I try to implement something like materialized views or should we continue on this patch?
Comment #14
gábor hojtsyWell, pid and rid are "precalculated" just like in materialized views, and help us skip a few tables in most cases. I think we should trade somewhere inbetween the best overall performance improvement and manageable code.
Comment #15
meba commentedChanging the query to LEFT JOIN saved over 2 seconds...
Posting an updated patch.
Comment #16
meba commentedForgot to delete from releases, fixed one phpdoc
Comment #17
gábor hojtsyI'm taking this on. Implemented some devel generate hooks which can be used with the 6MB database dump I have from localize.drupal.org. I can share that with you @meba, if interested. The two coupled can make a nice testing environment, where some of the queries are extremely slow :) Time to look into skipping a few tables in the queries.
Comment #18
gábor hojtsyTook the schema changes and the update code at first. Fixed the update code to work with less SQL queries. We can populate the pid based on the release list and populate the pid, rid on lines based on releases after that. This is still about 10000 queries on the update on ldo, but it runs well on my machine.
Then went to apply this to the overview/explore pages. Found that some of the DISTINCTs can be eliminated, but it was still a lot of time to render that with the generated data set. So added caching to the project and language overviews. Also took the changes for the translation and moderation listing queries. Did not take the other changes yet, but that is in progress.
Note that you did not include code to actually fill in the pid and rid when new code is actually parsed. So while the existing data is updated, it is not maintained with new data added.
To work on this more. Maybe only next week.
Comment #19
gábor hojtsyOh, of course you did include code to fill in the pid and rid in the new columns. Sorry for overlooking. I looked over all the changes in your patch and implemented them in this attached patch. Actually ended up implementing the rid/pid filling via stored statics, so that less queries are required. and extractions are quicker.
Also, took the opportunity to rework the query builder for the browse/translate pages to use the more flexible arrays as on the moderate page. Since the simpletests have pretty good coverage of these pages, I could easily verify that my refactorings worked (and initially they did not :).
Also, added more cache clearing and some advance rebuilding, so cron clears off all caches but rebuild the three most important ones. Also, removing or adding a language team clears the caches.
The test suite runs fine with this patch, so I'll tinker with it a little and hopefully commit and deploy tonight.
Comment #20
gábor hojtsySlightly tweaked comments, committed this patch. Thanks for all your contributions.
Comment #21
gábor hojtsyForgot to include files at one place. Committed this too.