One of the most important document formats for analysis in natural-language processing is emails, particularly for surveillance, and spam detection. The following functions form a basis for the handling of email-format data.


An MBOX file as a table of parsed metadata

Syntax: .nlp.loadEmails x

Where x is a string of the filepath, returns a table.

column type content
sender string Name and address of sender
to string Name and address of receiver/s
date timestamp Date
subject string Subject
text string Original text
contentType string Content type
payload string or list of dictionaries Payload

The MBOX file is the most common format for storing email messages on a hard drive. All the messages for each mailbox are stored as a single, long, text file in a string of concatenated e-mail messages, starting with the From header of the message.

q)cols email

Graph of who emailed whom, with the number of times they emailed

Syntax: x

Where x is a table (result from, returns a table of to-from pairing.


sender                           to                               volume
------------------------------------------------------------------------           1               1                  1          1          1          1            1                1             1          1                   2                     3                  3                 3                 3                 3             3                 3               3                   1

Parses an email in string format

Syntax: x

Where x is an email in a string format, returns a dictionary of the headers and content.

q) emailString
q)cols table