openNLP command line – Part 1 | The Joy of Content Server

openNLP command line – openNLP provides a Command Line Interface (CLI) to carry out different operations through the command line. This is part 1 of the command line description. The part 2 is here

Content

The Command line

Usage

Example of the Sentence Detector

Example of the Named Entities

The main groups

Document Categorizer

Language Detector

The Sentence Detector

Other NLP AI Contentserver Articles

Starting Page

Example Linguistic Features of NLP- Part1

Example Linguistic Features of NLP- Part2

The buildin NLP categorizer

Detecting the language

a name finder example

a name and location finder example

Content Server NLP application examples

Content Server Autocategorizer

The Command line

Type “openNLP” to see the complete list of commands

These are runnable jars and can be used directly from the command line.

Lets take a look on two examples on the usage.

Usage

Sentence Detector

Inputfile input.txt

Hello. How are you doing? We at ebit company help you use openNLP and OpenTexts ContentServer. It is the new free AI from Apache.

Doing the Sentence Detection

opennlp SentenceDetector <path to models>en-sent.bin < input.txt > output.txt

Output-File output.txt

Loading Sentence Detector model ...
done (0,049s)

Hello.
How are you doing?
We at ebit company help you use openNLP and OpenTexts ContentServer.
It is the new free AI from Apache.
Average: 266,7 sent/s

Total: 4 sent
Runtime: 0.015s
Execution time: 0,081 seconds

Named Entities

Inputfile input.txt

<START:person> Reiner Merz<END> is senior Content Server and AI specialist and 
<START:person> Alisa<END> works at the Irish Pub in Stuttgart

Doing the Name Find

opennlp TokenNameFinder <path to models>en-ner-person.bin <input.txt > output.txt

Output-File output.txt

Loading Token Name Finder model ...
done (0,582s)

<START:person> Reiner Merz<END> is senior Content Server and AI specialist and
<START:person> Alisa<END> works at the Irish Pub in Stuttgart
Average: 100,0 sent/s

Total: 2 sent
Runtime: 0.02s
Execution time: 0,659 seconds

Main Groups

Document Categorizer

This Document Categorizer group has 5 members

Doccat – Learned document categorizer itself
DoccatTrainer – Trainer for the learnable document categorizer
DoccatEvaluator – Measures the performance of the Doccat model with the reference data
DoccatCrossValidator – K-fold cross validator for the learnable Document Categorizer
DoccatConverter – Converts 20newsgroup data format to native OpenNLP format

Language Detector

the Language Detector group has 5 members

LanguageDetector – Learned language detector
LanguageDetectorTrainer – Trainer for the learnable language detector
LanguageDetectorConverter – Converts leipzig data format to native OpenNLP format
LanguageDetectorCrossValidator – K-fold cross validator for the learnable Language Detector
LanguageDetectorEvaluator – Measures the performance of the Language Detector model with the reference data

The Sentence Detector

The Sentence Detector group hat 5 members

SentenceDetector – Learnable sentence detector
SentenceDetectorTrainer – Trainer for the learnable sentence detector
SentenceDetectorEvaluator – Measures the performance of the learnable sentence detector
SentenceDetectorCrossValidator – K-fold cross validator for the learnable sentence detector
SentenceDetectorConverter – Converts external data formats (nkjp,irishsentencebank,ad,pos,masc,conllx,namefinder,parse,moses,conllu,letsmt) to native OpenNLP format

The Name Finder

The Name Finder group has 7 members

TokenNameFinder – Learnable name finder
TokenNameFinderTrainer – Trainer for the learnable name finder
TokenNameFinderEvaluator – Measures the performance of the NameFinder model with the reference data
TokenNameFinderCrossValidator – K-fold cross validator for the learnable Name Finder
TokenNameFinderConverter – Converts external data formats (evalita,ad,conll03,bionlp2004,conll02,masc,muc6,ontonotes,brat) to native OpenNLP format
CensusDictionaryCreator – Converts 1990 US Census names into a dictionary

The POSTagger

The POSTagger group has 5 members

POSTagger – Learnable part of speech tagger
POSTaggerTrainer – Trains a model for the part-of-speech tagger
POSTaggerEvaluator – Measures the performance of the POS tagger model with the reference data
POSTaggerCrossValidator – K-fold cross validator for the learnable POS tagger
POSTaggerConverter – Converts external data formats (ad,masc,conllx,parse,ontonotes,conllu) to native OpenNLP format

The Lemmatizer

The Lemmatizer group has 3 members

LemmatizerME – Learnable lemmatizer
LemmatizerTrainerME – Trainer for the learnable lemmatizer
LemmatizerEvaluator – Measures the performance of the Lemmatizer model with the reference data

The Chunker

The Chunker Group has 5 members

ChunkerME – Learnable chunker
ChunkerTrainerME – Trainer for the learnable chunker
ChunkerEvaluator – Measures the performance of the Chunker model with the reference data
ChunkerCrossValidator – K-fold cross validator for the chunker
ChunkerConverter – Converts ad data format to native OpenNLP format