Introduction

Welcome to the EnronData.org (EDO), the Enron Data Reconstruction Project. The collapse of Enron and subsequent public release of Enron data by the FERC has resulted in one of the largest and richest publicly available data sets for email research. This data has been widely and successfully used to support many academic research projects and commercial organizations that require email data; however, much more can be done.

The goals of the EnronData.org are to provide some alternative derivative data sets and to explain some of the more esoteric aspects of the datasets. This project was inspired by examining the current state of this rich dataset including: (a) examining the data itself, (b) listening to requirements from the community, and (c) observing questions people had on existing data sets. If you’ve ever wondered why the Enron email is the way it is, we may be able to explain it for you.

Projects being considered by EDO include:

Native PST and NSF Files: reconstituting PST and NSF email in the most original state possible, including attachments
Modified Datasets: creating modified datasets for research purposes, e.g. MIME / Maildir with restored headers and attachments if a need is identified
Directory Load Files: creating files for LDAP servers, Active Directory, and Domino Directory
Metadata Organization: creating EDRM files to associate metadata with the email files