Support from Acquia helps fund testing for Drupal Acquia logo

Comments

AohRveTPV’s picture

.

AohRveTPV’s picture

.

AohRveTPV’s picture

A workaround is to increase the PHP setting memory_limit to say 192M or 256M.

AohRveTPV’s picture

.

AohRveTPV’s picture

Version: 7.x-2.2 » 7.x-2.x-dev
Status: Active » Needs review
FileSize
2.75 KB

Looked into this and the problem is that PHP parse_ini_string() and parse_ini_file() are not very memory efficient.

The official php-browscap extension has also had this memory usage problem (see https://github.com/browscap/browscap-php/issues/20). Their current release branch uses parse_ini_file(). In their 3.x branch, though, they implement their own INI parser that is very memory efficient. The Browscap module could potentially use this parser.

However, I found a simpler way to reduce memory usage some: browscap.org provides a JSON download, and PHP's JSON decoder seems to be more memory efficient than their INI parser.

This patch reduces memory usage from 115MB to 81MB with Browscap data version 5308. (Your peak memory usage may vary if the import occurs within a cron run.) Please review and/or test.

AohRveTPV’s picture

Summary of possible solutions:
- Increase memory_limit in php.ini.
- Apply patch in #5 which decreases memory usage by about 30%.
- Re-use memory-efficient custom INI parser from php-browscap 3.x.
- Something else I haven't thought of.

greggles’s picture

Is this work merged into patch in comment #26 of #2305223: Optimize database access in _browscap_import()?

greggles’s picture

Status: Needs review » Needs work

Needs a reroll after the changes in 2305223.

AohRveTPV’s picture

Status: Needs work » Needs review
FileSize
3.54 KB

Updated patch, reduces memory usage from 115MB to 86MB.

After rerolling, memory usage was only reduced to 100MB. The user agent de-duplication code introduced in #2305223: Optimize database access in _browscap_import() became the peak usage. So I changed some of that code around and got the memory usage back down to 86MB. It could probably be reduced back to 81MB with further effort but I think this patch may be good enough.

Import time is not noticeably different. I did not yet verify that the loaded data is the exact same after this change.

greggles’s picture

Thanks I'll try to review this soon. Some quick questions:

Does GJK mean something to you? Those are my initials and I sometimes use them in variable names for something that I want to change for testing but don't want to commit. I don't see them in the existing code, so I think we should use something more descriptive of the purpose.

What are you using to measure the memory used?

AohRveTPV’s picture

Does GJK mean something to you? Those are my initials and I sometimes use them in variable names for something that I want to change for testing but don't want to commit. I don't see them in the existing code, so I think we should use something more descriptive of the purpose.

It is in the Browscap JSON data--not sure what it means. Maybe a coincidence on the initials.

By the way, this line (and only this line) looks to be ineffectual and could be removed:

$version = $browscap_data['GJK_Browscap_Version']['Version'];

$version is never used. The unpatched code assigns $version but apparently never uses it.

What are you using to measure the memory used?

print memory_get_peak_usage(); at the end of _browscap_import(). Not sure if there's a better way.

AohRveTPV’s picture

Gary Keith was an early maintainer (the first?) of the Browscap data. Maybe an initials collision.

AohRveTPV’s picture

Status: Needs review » Needs work

Compared browscap table produced by #9 vs. browscap table produced by HEAD, and there is at least one difference. One has string "true" and "false" values, the other boolean 1 and 0 values. Need to investigate further.

AohRveTPV’s picture

Status: Needs work » Needs review
FileSize
4.05 KB
700 bytes

browscap table produced by this patch is identical to table produced by HEAD. Same peak memory usage as #9, no change in speed. Not sure if there is a simpler way to convert TRUE and FALSE to "true" and "false".

greggles’s picture

Version: 7.x-2.x-dev » 6.x-2.x-dev
Status: Needs review » Patch (to be ported)

Great, committed that as well.

Thanks!

greggles’s picture

Version: 6.x-2.x-dev » 7.x-2.x-dev
Status: Patch (to be ported) » Needs review

Whoops, that comment was meant to be posted on #2305223: Optimize database access in _browscap_import(). I'd still like to review this a bit more.

lias’s picture

This memory exhausted error is currently preventing a brand new install of browscap from fetching the data using ver. 7.x-2.2+5-dev. So unable to get to work with 128M of memory.

AohRveTPV’s picture

Isabug, I just checked the peak memory usage of latest 7.x-2.x with Browscap data version 6000: 89MB when using drush browscap-import, and 83MB through the web interface.

That memory usage is close to 128MB so I suspect if your site itself uses a decent amount of memory, a 128MB limit will be exceeded.

Would be interested to know (if you have time):
- Is the 128MB memory limit is also exceeded on your system with a new, minimal Drupal installation?
- Is the memory usage is better or worse than the 7.x-2.2 stable release with Browscap data version 6000?

You might also try importing the data using drush browscap-import. Not sure but that may reduce the memory usage in your environment.

AohRveTPV’s picture

Isabug, I just realized that 7.x-2.x-dev does not yet include the patch in #14. Please try that patch if you can, it should improve the memory usage.

AohRveTPV’s picture

If you are running PHP 5.2 or above, this patch for Browscap 6.x-2.x should improve memory usage. I measured 75MB when using the "Refresh browscap data" button to import Browscap data version 6000.

Unfortunately, it is not suitable for commit because Drupal 6 requires only PHP 5, but the patch needs PHP 5.2 for json_decode().

greggles’s picture

It's fine for a contrib to require a higher version of php, especially since older versions of php are no longer supported by the php project or by many hosting companies. Just add a php = requirement to the .info file.

peterx’s picture

https://github.com/salsify/jsonstreamingparser is probably the next step. One test of the streaming parser cut 12 MB down to 1 MB. You could update the database row by row or in small batches.

AohRveTPV’s picture

It's fine for a contrib to require a higher version of php

Thanks, did not know this.

At least RHEL 5 (and derivatives like CentOS 5) are still supported and are packaging/supporting PHP 5.1, so it seems realistic that some sites may still be on PHP 5.1. Since json_decode() is introduced in 5.2, perhaps the safe solution would be use it conditionally on PHP version.

peterx’s picture

The jsonstreamingparser does not use json_decode(), removing another problem.

AohRveTPV’s picture

https://github.com/salsify/jsonstreamingparser is probably the next step. One test of the streaming parser cut 12 MB down to 1 MB. You could update the database row by row or in small batches.

A recent branch of the Browscap PHP extension has a custom INI parser that is very memory efficient (as mentioned in #5). This JSON parser is a good suggestion but I wonder if the INI parser might be a better choice since the Browscap project will presumably make sure it works with their data. However, using either parser would probably be a welcome improvement over the current code.

If we use a third-party parser, it should be installed as libraries rather than directly including it in the module, I guess? So the user installs the module then hook_requirements() tells them to install the library? Maybe the code could fall back to using the less efficient parser.

The jsonstreamingparser does not use json_decode(), removing another problem.

The Composer file indicates jsonstreamingparser requires PHP 5.3, so PHP version seems like it would still be an issue.

lias’s picture

@AohRveTPV thank you for the info re: patching 7x-2x-dev. I have done so and when I refreshed browscap data it worked! It still took a while but at least the memory required was less than the 128 it was initially breaking under. I will keep auto updates disabled at this time.

FYI -I am unable to use drush due to drupal on a windows box.

Thank you for the assistance.

peterx’s picture

Decoding a .ini file with the ini_string function should be slightly easier than decoding json. You can read the file line by line, collect a section, then decode and update the database. The database update will be the slowest bit. Aggregating several rows in each update will make the process faster. Given that it runs in the background, the main problem with a really slow process will be timeouts.

Would be nice if the Browsecap people provided a set of smaller files or just a weekly delta file.

peterx’s picture

Ran a test reading a local file one section at a time through parse_ini_string():
Read the browscap file php_browscap.ini.
Memory usage: 1,838,624 bytes after 0.028544 seconds and 0 lines.
Memory usage: 1,840,584 bytes after 1.284847 seconds and 398955 lines.

Read the browscap file at full_php_browscap.ini.
Memory usage: 1,839,296 bytes after 0.027104 seconds and 0 lines.
Memory usage: 1,841,512 bytes after 2.769346 seconds and 994676 lines.

Read the browscap file at http://browscap.org/stream?q=Full_PHP_BrowsCapINI.
Memory usage: 1,838,632 bytes after 0.031708 seconds and 0 lines.
Memory usage: 1,844,136 bytes after 23.075820 seconds and 994676 lines.

AohRveTPV’s picture

Properties inheritance is a challenge for parsing the INI file in parts. To load all the parent properties for a browser, those parents (currently) have to be in memory. With all the properties in memory, I think the memory usage might still be high. One solution would be to change Browscap to not inherit properties at import as suggested in #2411405: Reduce browscap table size, which would seem to allow loading only one browser's properties into memory at a time.

peterx’s picture

_browscap_save_parsed_data() batches the import inserts for efficiency. We can use that.

Cache default properties because it is used frequently.
Read the sections until you hit a section with the default properties as parent.
Process that batch.

The insert code could still aggregate a 1000 rows.

AohRveTPV’s picture

#31 sounds like a pretty good approach to me. If you are implementing this I would be happy to review/test.

peterx’s picture

@AohRveTPV, almost there.
Read the browscap file at php_browscap.ini.
Memory usage: 1,732,768 bytes after 0.029620 seconds and 0 lines.
Memory usage: 27,764,280 bytes after 23.525675 seconds and 398955 lines.

peterx’s picture

FileSize
14.63 KB

If anyone wants to experiment, here is my code and browscap in the same lump. My code is run from a test page containing:
$file = '/home/peter/Downloads/browscap/php_browscap.ini';
$import = new browscap_import(FALSE, $file);
Testing from the browscap site is a problem because they switch off access after a small number of accesses. You then have to wait 24 hours.

The database inserts are not batched to 1000. They are batched based on a common parent. You could add extra code to check for 1000 entries then detect the change of parent.

PHP < 5.3 will still require batch .ini decoding through a temp file.

izmeez’s picture

Thanks patch in #21 works well for 6.x-2.1

AohRveTPV’s picture

Status: Needs review » Needs work
FileSize
6.45 KB

Thanks, peterx. I did not realize before that each part of the INI file delimited by header comments was self-contained. (It contains all parents for the sections except for Default Properties.) That makes parsing in parts fairly straightforward.

This patch re-implements your basic approach (I think) on top of import.inc. We could use your code too, but I was not sure how to integrate it with the existing code, and it seems a ways off from complying with the coding standards. If you want to integrate your code and change it to follow the coding standards, I would be glad to work on it instead.

Before patch:
Execution time: 55s
Peak memory usage: 123MB

After patch:
Execution time: 95s
Peak memory usage: 44MB

AohRveTPV’s picture

FileSize
6.45 KB

Patch in previous comment was misnamed.

peterx’s picture

Thank you AohRveTPV for looking at my code as a proof of concept instead of a patch. My other code additions were purely to make testing easier by reading a local file.

The code format style is, in part, Drupal 8. I prefer some of the changes in Drupal 8 because they are closer to the most readable code styles. It is hard going back to the D7 style.

AohRveTPV’s picture

Status: Needs work » Needs review
FileSize
10.06 KB

This patch fixes problems with #37 and is ready for review/testing.

The INI file is parsed by "divisions". (Divisions are what the Browscap project calls the parts of the INI file it generates that are delimited by header comments.) We read one division at a time, parse it as an INI string/file using the built-in PHP functions, then save it.

This reduces memory usage to 42MB from 123MB with Browscap data version 6001, but increases import request time from 55s to 85s on my system with MySQL InnoDB. I think this is a good trade-off because it will avoid the PHP memory limit in common configurations, but the long request time will probably not cause any problems it was not already.

A downside of this patch is it makes the INI parsing a bit more fragile. Rather than just relying on a well-formatted INI file, we add some assumptions about the file:
- It is organized in divisions which are separated by header comments prefixed with 40 semicolons.
- The default properties division is the third division.
- Each division after the third inherits only from parent sections within its division, other than the default properties section.

I added some assertions for the first two of these assumptions so that importing will hopefully fail gracefully if the Browscap project changes how they are formatting the INI file. (The Browscap project has no reason to expect external code is relying on this organization.)

A problem is that when reading in pieces it was not easily possible to remove case duplicates globally. So I included the code from #2310537: 'useragent' field not case sensitive, which changes the useragent column of the Browscap table to be case sensitive.

Chunking database inserts 1000 user agents at a time was removed as peterx did because it does not help performance noticeably--processing by divisions effects chunking.

greggles’s picture

I thought we decided in #2310537 that case-sensitivity was a bad idea?

AohRveTPV’s picture

I thought we decided in #2310537 that case-sensitivity was a bad idea?

Yes. The case duplicates are seemingly useless because browscap_get_browser() does a case-insensitive query (as does get_browser()). So we decided to just remove case duplicates.

The problem though is if the INI file is being processed in parts as in #39, there is no way to know whether a user agent in the current part has a case duplicate in a later part of the file. (Parsing the entire file at once in order to remove duplicates as we are currently doing would defeat the purpose of parsing in parts to reduce memory usage.) Some possible solutions:
1. Keep a list of all user agents encountered and only insert user agents not in that list.
2. Query the database to make sure user agent is not already present before inserting.
3. Preprocess the raw INI string and remove case duplicates before parsing.
4. Change useragent column to binary so case duplicates are allowed.

#1 would increase memory usage and increase execution time. (How much I'm not sure.) #2 would increase execution time. #3 would work but requires us to write code to parse/edit the INI string. #4 is simpler to implement than #3 and avoids having to deal with case duplicates at all.

Even with the useragent column being binary, browscap_get_browser() is still case insensitive. So there seems to be no change in behavior. Are there any downsides to the column being binary?

peterx’s picture

I tried 1 and 2. 2 increased the import to hours. For one, I stored only the lower case versions and compared with lower case. The time was not much different.

AohRveTPV’s picture

FileSize
2.48 KB

peterx, I just quickly implemented the approach of keeping track of all inserted user agents and only inserting user agents that are not case duplicates of members of that list.

Before:
Peak memory usage: 42MB
Execution time: 85s

After:
Peak memory usage: 60MB
Execution time: 85s

Indeed the execution time is no different, but the memory usage is worse. Maybe I am doing something inefficiently.

This patch shows changes made versus #39.

AohRveTPV’s picture

Minor code simplification versus #39. No noticeable performance impact.

AohRveTPV’s picture

I looked into reducing the memory usage of #44. Peak memory usage is 27MB with the downloaded INI string in memory, but increases to 42MB once that string is passed to strpos(). PHP seems to allocate a buffer for the entire string passed to strpos(). So I do not see a simple way to reduce the memory usage further.

#33 reported 27MB. I suspect this lower number is due to either:
1. Measuring memory usage using memory_get_usage() instead of memory_get_peak_usage(). With #44, for instance, memory usage spikes to 42MB during strpos() but returns to 27MB immediately after.
2. Reading from an INI file instead of an INI string in memory.

AohRveTPV’s picture

Regarding the case issue, making the useragent column case sensitive currently seems to me like the ideal solution. The Browscap data is case sensitive, so if we make the Browscap module storage of it case sensitive, the problem is solved. It adds complexity to the code (and inefficiency, apparently) to manipulate the case.

Maybe there are drawbacks to making the column "binary"? I do not know of any. It is supported by the Schema API and is not a hack:

 *     - 'binary': A boolean indicating that MySQL should force 'char',
 *       'varchar' or 'text' fields to use case-sensitive binary collation.
 *       This has no effect on other database types for which case sensitivity
 *       is already the default behavior.

Edit: I suppose it could change whether user agent look-ups are case sensitive with database systems other than MySQL. Will test.

AohRveTPV’s picture

I tested case sensitive look-ups with several database systems:

Before patch in #44:
MySQL: insensitive
SQLite: insensitive
PostgreSQL: sensitive

After patch in #44:
MySQL: insensitive
SQLite: insensitive
PostgreSQL: sensitive

So the change of the user agent column to binary does not affect the case behavior for those database systems.

AohRveTPV’s picture

Re #47:

We could, for consistency, make PostgreSQL also do a case insensitive look-up if we used db_select() instead of db_query(). The LIKE condition gets replaced with ILIKE by the PostgreSQL implementation of the QueryConditionInterface.

Alternatively, since db_query() supposedly gives better performance, we could just decide we do not care that PostgreSQL browscap_get_browser() is case sensitive while PHP get_browser(), MySQL browscap_get_browser(), and SQLite browscap_get_browser() are case insensitive.

Whichever, the column being binary seems like it should not matter. Even if the useragent column is case sensitive, the LIKE condition can still be case insensitive.

peterx’s picture

Used some new code to download the browscap file to a local temp file:
Reading http://browscap.org//stream?q=PHP_BrowsCapINI
Memory usage (real) at start: 1835008
Memory usage (real) at end: 2097152

The file can then be read row by row for local processing.

https://www.drupal.org/sandbox/peter/2475021

AohRveTPV’s picture

Used some new code to download the browscap file to a local temp file:
Reading http://browscap.org//stream?q=PHP_BrowsCapINI
Memory usage (real) at start: 1835008
Memory usage (real) at end: 2097152

The file can then be read row by row for local processing.

https://www.drupal.org/sandbox/peter/2475021

That is impressive! My understanding is this approach also requires parsing the INI string in divisions (as done in #36 or #44). I propose that we try to get the parsing by divisions committed first, then we could extend to further reduce the memory usage using row-by-row file reading. Parsing by divisions is already a complex change that will need review/testing.

peterx’s picture

VBN’s picture

I have a question about browscap.
Was running fine but after the automatic update i had the same error as topic starter.
Even worse, because i could not run cron anymore...
So I wanted to install a previous version again.

The strange thing is that i disabled the module, and the error disappeared.
My site seems to be running fine without the browscap module (witch i installed with the themekey module)

I did some test and the only thing what is not working right now is the automatic detection and theme-switch for mobile devices.
- What else does it do? Is it really necessary besides themekey/Mobile switch?
- Would it be ok to delete the module and reinstall to receive previous update?

peterx’s picture

@VBN, it is not the browscap software, it is the downloaded file growing larger every day. There is no easy way to request an older smaller file. There are three alternatives for mobile detection.

* Use the browscap patches to use less memory. Hopefully this will become a proper dev release soon.

* Replace themekey with another module that does not use browscap.

* Replace Browscap with another module that feeds into Themekey.

I have a site that needs Themekey only for mobile and I wrote a module that uses just a few lines to detect the word "mobile" in the user agent string. At another site, I produced something to feed into Themekey. The two could be combined. You would not get all the other information returned by Browscap.

VBN’s picture

@Peter,

Thx for your interest.
Strange that a downloaded file keeps messing with memory.
But i must admit, I don't really understand what browscap does....never mind that.
Is it possible to remove that downloaded file and just start over again?
(Like you should create as a new install.)

I'm very interested in that little workaround you created.
I just want users to have the mobile theme automatically selected if they use a mobile device.
But i really like to keep Themekey, it's a nice and easy feature.
Since it's the same purpose with me, maybe you would care to share your solution?

Maybe better to contact me in PM if you think it's appropriate to reply here.
Thx in advance.

AohRveTPV’s picture

Is it possible to remove that downloaded file and just start over again?
(Like you should create as a new install.)

Yes, you can just disable, uninstall, then enable the module. That will remove all downloaded Browscap data. However, you may get the "memory exhausted" error when you try importing (i.e., refreshing) the data again.

Note: The downloaded Browscap data is stored in the database and not a file.

But i must admit, I don't really understand what browscap does....never mind that.

The module just provides a way to look up browser information for a given user agent string. For example, Browscap can translate Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0 to information about that browser and its capabilities.

So the module is mainly used as a dependency for other modules that need to be able to do such a look-up.

AohRveTPV’s picture

Use the browscap patches to use less memory. Hopefully this will become a proper dev release soon.

Was hoping for one more person to review/test #44 before setting the issue to RTBC. Thanks for reporting back it is working for you, peterx.

peterx’s picture

@VBN, testing the revised Browscap module would help push the current change into a dev or production release.

After you test it and report the results back here, you might like to test the little alternative I put in a sandbox while experimenting with ways to shrink Browscap.
https://www.drupal.org/sandbox/peter/2484347

The Browscap file contains more information, is used on tens of thousands of sites, and will be better maintained.

VBN’s picture

Hi Peterx,

I updated my site with the 7.x-2.2+5-dev version and added the path from #44.
No problems when running cron at this point.

But when i push the "Refresh browscap data"-button the status remains on Never fetched?

Never fetched

PS: Could you upload the patched browscap.install and import.inc files here?
Maybe i did something wrong...:)

AohRveTPV’s picture

FileSize
4.41 KB

But when i push the "Refresh browscap data"-button the status remains on Never fetched?

Any errors in the Drupal watchdog? PHP errors in the web server log?

The request could be timing out before completion.

PS: Could you upload the patched browscap.install and import.inc files here?

Done.

VBN’s picture

Ok AohRveTPV,

Thank you so very much.
I replaced my files in the module with your uploaded ones.
Working like a charm! i refreshed and i have version 6002 now.
So it was my faulty patch replacement...(tsss, i really need to use Cygwin)
I can confirm that patch #44 is working.
You should throw in production stage.

Great stuff you share here!

AohRveTPV’s picture

Status: Needs review » Needs work

Reviewing #44 I found at least two mistakes in comments. Will fix.

Also want to add type hinting to the new functions.

AohRveTPV’s picture

Status: Needs work » Needs review
FileSize
10.26 KB
1.22 KB

Changes:
1. Clarify/fix a comment.
2. Fix apparent error in parsing data with PHP <5.3.

Need to test the second change.

AohRveTPV’s picture

I can confirm that patch #44 is working.

Great, thanks for reporting back that it is working.

So it was my faulty patch replacement...(tsss, i really need to use Cygwin)

The Git for Windows installer includes MinGW, which provides a Unix-like shell. In that shell you can apply a patch using git apply -v foo.patch. It probably also includes diff, which can also be used to apply a patch.

VBN’s picture

The Git for Windows installer includes MinGW, which provides a Unix-like shell. In that shell you can apply a patch using git apply -v foo.patch. It probably also includes diff, which can also be used to apply a patch.

Thx, meanwhile i found that Netbeans provides a nice easy way to patch.

AohRveTPV’s picture

2. Fix apparent error in parsing data with PHP <5.3.

Need to test the second change.

Would like to just remove the PHP <5.3 code because it seems useless and I cannot find an easy way to test it:
#2509122: Remove PHP 5.2 or earlier support

izmeez’s picture

Regarding 6.x further to comment #35 and patch in #21, now after 2 months cron is choking with memory exhausted.

AohRveTPV’s picture

Thanks for the update, izmeez. It looks like the Browscap project changed the JSON file to have the same data as the full INI file:
https://github.com/browscap/browscap/pull/594

This change approximately doubled the size of the file and proportionally increased the memory usage. Previously the JSON file contained the same data as the standard INI file, which is currently used by unpatched Browscap.

So, the patches in #14 and #21 are no longer useful. The approach used by the 7.x-2.x patch in #62 should still work. It could be ported to 6.x-2.x.

As a hack, you could change the URL used in import.inc to that of the "lite" INI file:

Change:
http://browscap.org/stream?q=PHP_BrowsCapINI

To:
http://browscap.org/stream?q=Lite_PHP_BrowsCapINI

That should work if the "lite" file contains the properties you need.