simpleNLP 1.0

A simple classifier that takes text as input from a file formatted as comma separated values and evaluates whether a medical condition is described as present or absent in the text. The application is configurable to search for any condition and can be tweaked to work with different kinds of input text.

simpleNLP.zip - An archive which contains:

  • simpleNLP.exe - A Windows Application
  • config.txt - Configuration File
  • data.csv - Sample data file

Getting Started:

  1. Download zip file and unpack the three files into one folder.
  2. Modify configuration file as needed to define the condition of interest.
  3. Replace data file with your own data -- the application reads all files in this folder named *.csv. Each row in the data files is a new report and the full text of the report is entered into the first column -- data file can have any other columns as needed. The first row has column names.
  4. The application will generate a new csv formatted file that adds columns to the original with results

You can run the application with the provided config and data files and review the output in the new file that is generated called result_data.csv.

The algorithm employed was developed in perl and is described here: Lingua::DxExtractor.

The perl version of this script is available here: Source

This windows executable file was created by running:
>pp --gui -o simpleNLP.exe simpleNLP.pl -M "Text::CSV_PP"

Additionally, you can rename the executable and call it simpleNLP.zip and then open the archive in order to read the underlying source code.