Closed (fixed)
Project:
Apache Solr Search
Version:
5.x-2.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
24 Mar 2010 at 08:33 UTC
Updated:
3 Jan 2014 at 01:08 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
skwashd commentedTrying to attach the patch again. It was attached before I submitted :\
Comment #2
robertdouglass commentedOne of the unfortunate problems of this patch is it will cause the need to delete the index and re-index everything.
Comment #3
Scott Reynolds commentedThis strikes me as more then a typo. It has the feel of being done on purpose. Any idea?
Comment #4
pwolanin commentedOf course it's done on purpose - to reduce any possible information leakage about the private key.
Also - there is no reason this would reduce the combinations in any meaningful way, and it's only done once so performance is a non-issue
Comment #5
skwashd commented@robertDouglass it won't need to be reindexed as the key is only generated once and then stored in the variable table.
@pwolanin the key generation code isn't a black box it is available as PHP source code in a public repository and distributed unencoded. There is no (psuedo) random value used so the key is reproducible. There is, at most, only 16^32 possible combinations outputted from the first call of md5(), so you are hashing a subset, then taking a sub set of that string as the final value, leaving you with less than 12^32 possible keys. The second md5() provides no benefit in generating unique or more secret keys. If the aim is "to reduce any possible information leakage about the private key", then introducing some randomness to the seed, such as mt_rand() or time() would better achieve this goal. Reopening as the current code doesn't fulfil its stated purpose.
Comment #6
Scott Reynolds commentedand then stored in the solr schema for every node...
Comment #7
damien tournoud commentedLet's do that. As stated, the key is only generated once, changing the way the key is generated will not impact any existing installation.
Comment #8
damien tournoud commentedThis said, I agree that this should just generate a random value. There is zero point in using the private key here.
The standard way to do that is:
Comment #9
skwashd commented@Scott Reynolds yes, but the key is only generated once then stored, so changing the key generation algorithm has no impact on existing users as their current key will remain unchanged and so not necessitate reindexing.
@Damien Tournoud I'm happy to do it either way, so long as it doesn't involve calling md5(md5()).
I'll wait for followup comments before rerolling.
Comment #10
pwolanin commented@DamZ - no point in the md5 even?
Comment #11
damien tournoud commented@pwolanin: no real point in md5() here, except to normalize the value (in the sense: stretch it to a normalized size).
Comment #12
skwashd commenteduiniqid() only appends the the microseconds on *nix, under windows you only get the hexadecimal value, so you have 1 second windows for race conditions under windows. The risk of collisions under *nix are reduced, but to remove the risk (almost) completely on either platform, you could use the following:
You end up with a unique key that has a random and a known component and the format remains unchanged from current implementation.
Comment #13
damien tournoud commented@skwashd: no, the hex part is the part that contains the time. The rest is controlled by the $more_entropy parameter, and the risk of collision is unbelievably small already.
Comment #14
skwashd commented@Damien Tournoud
OK, I was wrong. Did some RTFMing and leant something today.
Comment #15
pwolanin commentedUniqueness is really not that big a worry. This value is only useful if you are running a multisite search or doing some reindexing tricks, and typically even those people will have just a few sites.
Given that there are other problems with long URLs, I suggest we try to reduce the # of characters further.
Something like:
Which gives us a default hash space of about 2 * 10^7 and is the same as the 12 character hex value.
Or we could go for 5 characters - still about 60 * 10^6 values in the hash space. Seems more than enough.
Comment #16
pwolanin commentedComment #17
skwashd commentedI have ~2100 production drupal sites using an pair of apache solr servers. Although I don't plan provision additional sites any time soon, I'd like to avoid the potential for race conditions when having aegir deploying 20 sites at a time.
Comment #18
skwashd commentedreverting field changes
Comment #19
pwolanin commented6.x-1.x is the main branch for fixes. We can leave this at 6 digits, which should be the same size space as currently.
uniqid() with extra entropy shoudl be pretty immune to race conditions - the only issue is collisions in the hash space. If you are worried about actual race conditions, it would be better even to leave out the $base_url.
Comment #20
pwolanin commentedFixed code comment - committed this patch to 6.x-1.x
Comment #21
robertdouglass commentedCommitted to 6.x-2.x.
Comment #22
jpmckinney commentedFixed in 5-2. http://drupal.org/cvs?commit=361238