Research

A number of research studies reference the Enron Email Dataset and are listed here. Studies that focus on other datasets but briefly describe the Enron corpus are also included. Let us know if there are additional studies that should be listed.

  1. Enron Background
    1. The Western Energy Crisis, the Enron Bankruptcy, and FERC’s Response.
    2. Matus, Roger and Sean True. Email Liability, Compliance, and Policy Management Risk: A Case Study of the Enron Corporation. (Concord, MA. 2003).
    3. U.S. Probing Shredding of Data at Enron. Los Angeles Times.
  2. Annotation
    1. Ulrich, Jan, Gabriel Murray, and Giuseppe Carenini. A Publicly Available Annotated Corpus for Supervised Email Summarization. AAAI-2008 EMAIL Workshop, Chicago, Jul 2008. – W3C Corpus focus.
  3. Attachments
    1. Dredze, Mark and John Blitzer, Fernando Pereira. “Sorry, I Forgot the Attachment:” Email Attachment Prediction. CEAS, 2006.
  4. Classification
    1. Bekkerman, Ron, Andrew McCallum, Gary Huang. Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. CIIR Technical Report IR-418 2004 (Amherst, MA. 2005).
  5. Datasets
    1. Berry, Michael W. and Murray Browne. The 2001 Annotated (by Topic) Enron Email Data Set (Philadelphia, PA. 2007).
    2. Klimt, Bryan and Yiming Yang. Introducing the Enron Corpus. In Proceedings of the CEAS 2004 (Mountain View, CA. 2004).
    3. Shetty, Jitesh and Jafar Adibi. The Enron Email Dataset: Database Schema and Brief Statistical Report. Information Sciences Institute Technical Report, University of Southern California. (Marina del Rey, CA. 2004).
    4. Waterman, K. Krasnow. Knowledge Discovery in Corporate Email: The Compliance Bot Meets Enron (Cambridge, MA. 2006).
    5. Zhou, Yingjie, Malik Magdon-Ismail, Al Wallace. Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset. 5th Conf. of North American Association for Computational Social and Organizational Science (NAACSOS 07). (Emory – Atlanta, GA. June 7-9, 2007).
  6. Information Retrieval
    1. Stockinger, Kurt, Doron Rotem, Arie Shoshani, Kesheng Wu. Enron Data Revisited – Neighborhood Queries with FastBit Win over Popular Commercial Database System. (Berkeley, CA. 2006).
    2. Stockinger, Kurt, Doron Rotem, Arie Shoshani, Kesheng Wu. Analyzing Enron Data: Bitmap Indexing Outperforms MySQL Queries by Several Orders of Magnitude. (Berkeley, CA. 2006.).
  7. Social Networks
    1. Diesner, Jana and Kathleen M. Carley. Exploration of Communication Networks From the Enron Email Corpus. In Proceedings of the SIAM International Conference on Data Mining, Workshop on Link Analysis, pp. 3-14, Counterterrorism and Security, Newport Beach, CA, April 2005.
    2. Corrada-Emmanuel, Andres, Andrew McCallum and Xuerui Wang. Language Use in a Social Network: The Enron Email Dataset. (Amherst, MA. 2004).
  8. Structure
    1. Keila, P.S. and D.B. Skillicorn. Structure in the Enron Email Dataset. In Proceedings of the SIAM International Conference on Data Mining, SDM 2005, April 23 2005. (Newport Beach, CA. 2005).
  9. Surveillance
    1. Berry, Michael W. and Browne, Murray. Text Mining Approaches for Email Surveillance: Massive Data Sets Workshop, Stanford/Yahoo!. 2006.
  10. Threading
    1. Deepak P*, Dinesh Garg, Virendra K. Varshney. Analysis of Enron Email Threads and Quantification of Employee Responsiveness (Bangalore, India).
    2. Yeh, Jen-Yuan and Arron Harnly. Email Thread Reassembly Using Similarity Matching. In Proceedings of the CEAS 2006 (Mountain View, CA. 2006).