In this particular situation we have the preposition that there are NO "References:" or "In-Reply-To:" in the original email message.

In this case sometimes threads are connected wrongly if the subject is same. This happens most at short generic subjects and also on empty ones. e.g. a "Hello" from 7 years ago got linked to a "Hello" from 2 weeks ago on messages that do not belong to one another a thread.

Possible solution:
Test if the other subject is below a certain age, e.g. if subject is same but they differ in e.g. 30 weeks it is unlikely that it belongs to the other message. The threshold could be a settable value.

Maybe add more probability fields into the equation e.g. mail client "User-Agent:" or IP since they are most of the time the same in a short time-frame with same user. And same users posting in a short time-frame are more likely to belong to the tread.

maybe the field value last_comment_timestamp' => $node->changed can be used