Currently, the robotstxt variable is set in robotstxt_install(). This disagrees with core practices and standards. Additionally, consider the following scenario:

- User has changed their ./robots.txt file with some custom additions (call it revision A).
- User installs the robotstxt module, but doesn't actually visit the configuration pages. The 'robotstxt' variable is set to the content of ./robots.txt revision A.
- User forgets they have the robotstxt module enabled and makes another addition to their ./robots.txt file (call it revision B).
- When the robots.txt path is loaded by the module, the content that is shown is revision A.
- User is confused as to why revision B is not shown.

Since the user never went to admin/settings/robotstxt and pressed 'Save configuration', the proper way would to not set the variable on install, and always check if the variable has content, and if not, load data from the ./robots.txt files.

Does that makes sense?

CommentFileSizeAuthor
0-robotstxt-remove-install-D6.patch937 bytesdave reid

Comments

dave reid’s picture

Issue tags: +Needs backport to D5
hass’s picture

Status: Needs review » Closed (won't fix)

No, it's too expensive to read the robots.txt file on every page request. This is why the file is imported in install. Variables table is cached.

hass’s picture

Status: Closed (won't fix) » Closed (works as designed)
dave reid’s picture

Status: Closed (works as designed) » Postponed (maintainer needs more info)

I didn't consider that. Let me do some actual benchmarks first and if it's a noticeable different, I'll mark this back as by design.

hass’s picture

I'm pretty sure I will not change my mind about this as it also overcomplicates the way how we throw the warning message on the configuration page and you haven't added the fall back logic to read the root's robots.txt on install (if it may exists) and if not exists the modules robots.txt. Normally the file should not exists, but a support case have shown that we need to write this install step down... Don't waste your time :-).

dave reid’s picture

Here's the benchmarks using ab -n 500 http://mysql.drupal6.local/robots.txt:

READING FROM ./robots.txt (bypass module)

Requests per second:    1649.01 [#/sec] (mean)
Time per request:       0.606 [ms] (mean)
Time per request:       0.606 [ms] (mean, across all concurrent requests)
Transfer rate:          3034.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   1.4      0      24
Waiting:        0    0   0.6      0       9
Total:          0    0   1.4      0      24

READING FROM saved 'robotstxt' variable

Requests per second:    7.61 [#/sec] (mean)
Time per request:       131.368 [ms] (mean)
Time per request:       131.368 [ms] (mean, across all concurrent requests)
Transfer rate:          16.59 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   128  130   3.1    130     178
Waiting:      128  130   2.2    130     145
Total:        128  130   3.1    130     178

READING FROM sites/all/modules/robotstxt/robots.txt

Requests per second:    7.60 [#/sec] (mean)
Time per request:       131.538 [ms] (mean)
Time per request:       131.538 [ms] (mean, across all concurrent requests)
Transfer rate:          16.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   128  131   3.0    130     169
Waiting:      122  130   2.6    130     161
Total:        128  131   3.0    130     169

So per request, the variable has only a 170 ms over reading from the file. That's well within the margin of error for the benchmarks and shows that there is really no advantage to setting the variable on install.

hass’s picture

Status: Postponed (maintainer needs more info) » Closed (works as designed)

How have you tested this? You need to compare the below 3 scenarios. I don't expect that you have used two file exists checks and this is faster than the DB check... and as you said there is no advantage to setting and therefore we can stay with the setting as it's much easier to maintain and use for the drupal_set_message() and other reasons. So - by design.

Scenario 1 (current):
1. Read the cached variable from DB -> display the content

Scenario 2:
1. File exists ./robots.txt? Yes -> display the content

Scenario 3 (typical after the change):
1. File exists ./robots.txt? No -> jump to #2
2. File exists [module folder]/robots.txt? Yes -> display the content