The EDRM group has released volume 2 of its Enron email data set. It’s an enormous trove of email from the Enron cases, available in XML or PST format.
This will likely become the gold standard for software testing, as well as reviewer testing. Want to know if potential reviewers can find a known smoking gun? Test them on this set, not actual data. Want to know how your review software will handle the strain of a few million records? The Enron set should allow apples-to-apples comparison of software performance; there’s no more excuses for piddly 50 or 100-email “test” sets.
The data should also be fairly easy to customize, and to mix and match with other data, letting providers create far better testing and sampling programs.