Skip to content
Machine learning

Useful functions

These are functions that extract elements of the text that can be applied to NLP algorithms, or that can help you with your analysis.

.nlp.findTimes

All the times in a document

Syntax: .nlp.findTimes x

Where x is a string, returns a general list:

  1. time
  2. text of the time (string)
  3. start index (long)
  4. index after the end index (long)
q).nlp.findTimes "I went to work at 9:00am and had a coffee at 10:20"
09:00:00.000 "9:00am" 18 24
10:20:00.000 "10:20"  45 50

.nlp.findDates

All the dates in a document

Syntax: .nlp.findDates x

Where x is a string

returns a general list:

  1. start date of the range
  2. end date of the range
  3. text of the range
  4. start index of the date (long)
  5. index after the end index (long)
q).nlp.findDates "I am going on holidays on the 12/04/2018 to New York and come back on the 18.04.2018"
2018.04.12 2018.04.12 "12/04/2018" 30 40
2018.04.18 2018.04.18 "18.04.2018" 74 84

.nlp.getSentences

A document partitioned into sentences.

Syntax: .nlp.getSentences x

Where x is a dictionary or a table of document records or subcorpus, returns a list of strings.

/finds the sentences in the first chapter of MobyDick
q) .nlp.getSentences corpus[0]

.nlp.loadTextFromDir

All the files in a directory, imported recursively

Syntax: .nlp.loadTextFromDir x

Where x the directory’s filepath as a string, returns a table of filenames, paths and texts.

q).nlp.loadTextFromDir["./datasets/maildir/skilling-j"]

fileName path                                           text                 ..
-----------------------------------------------------------------------------..
1.       :./datasets/maildir/skilling-j/_sent_mail/1.   "Message-ID: <1461010..
10.      :./datasets/maildir/skilling-j/_sent_mail/10.  "Message-ID: <1371054..
100.     :./datasets/maildir/skilling-j/_sent_mail/100. "Message-ID: <47397.1..
101.     :./datasets/maildir/skilling-j/_sent_mail/101. "Message-ID: <2486283..