Archive for the ‘Uncategorized’ Category.

EDRM Enron Data Set with Attachments

I’m pleased to announce that an initial version of the EDRM Enron Email Data Set consisting of 40GB of PST files with attachments and folder structure is now available within the EDRM project as of the EDRM 2009-2010 Kick-Off Meeting. The EDRM Data Set Project is now working to make this data set publicly available.

This initial data set was created by myself and a team at ZL Technologies; however, more work remains and I think the EDRM Data Set project is an ideal group to head up the effort to publish some industry standard data sets.

Some of the issues that the EDRM Data Set Project will be looking at include addressing privacy concerns, the publishing of smaller data set slices, and distribution methods for large data sets. If you would like to participate in this process, please join EDRM.

John Wang
EnronData.org
EDRM Data Set Project Lead

Enron Data at 2009-2010 EDRM Kick-Off Meeting

A number of people have contacted me about getting the current PST corpus via an alternative manner. This is partially due to the bandwidth restrictions that have been in place for the HTTP download. I planned to put in some other download methods but haven’t had time yet. Until then, if you will be at the EDRM Kick-Off meeting and you would still like a copy, bring a 1+ GB USB key and find me at the meeting. If you are interested, please let me know beforehand so I can plan ahead.