openNLP command line – Part 1

openNLP command line – openNLP provides a Command Line Interface (CLI) to carry out different operations through the command line. This is part 1 of the command line description. The part 2 is here

Content

The Command line

Usage

Example of the Sentence Detector

Example of the Named Entities

The main groups

Document Categorizer

Language Detector

The Sentence Detector

The Name Finder

The POSTagger

The Lemmatizer

The Chunker

Other NLP AI Contentserver Articles

Starting Page

Example Linguistic Features of NLP- Part1

Example Linguistic Features of NLP- Part2

The buildin NLP categorizer

Detecting the language

a name finder example

a name and location finder example

Content Server NLP application examples

Content Server Autocategorizer

The Command line

Type “openNLP” to see the complete list of commands

These are runnable jars and can be used directly from the command line.

Lets take a look on two examples on the usage.

Usage

Sentence Detector

Inputfile input.txt

Hello. How are you doing? We at ebit company help you use openNLP and OpenTexts ContentServer. It is the new free AI from Apache.

Doing the Sentence Detection

opennlp SentenceDetector <path to models>en-sent.bin < input.txt > output.txt

Output-File output.txt

Loading Sentence Detector model ...
done (0,049s)

Hello.
How are you doing?
We at ebit company help you use openNLP and OpenTexts ContentServer.
It is the new free AI from Apache.
Average: 266,7 sent/s

Total: 4 sent
Runtime: 0.015s
Execution time: 0,081 seconds

Named Entities

Inputfile input.txt

<START:person> Reiner Merz<END> is senior Content Server and AI specialist and 
<START:person> Alisa<END> works at the Irish Pub in Stuttgart 

Doing the Name Find

opennlp TokenNameFinder <path to models>en-ner-person.bin <input.txt > output.txt

Output-File output.txt

Loading Token Name Finder model ...
done (0,582s)

<START:person> Reiner Merz<END> is senior Content Server and AI specialist and
<START:person> Alisa<END> works at the Irish Pub in Stuttgart
Average: 100,0 sent/s

Total: 2 sent
Runtime: 0.02s
Execution time: 0,659 seconds

Main Groups

Document Categorizer

This Document Categorizer group has 5 members

  • Doccat – Learned document categorizer itself
  • DoccatTrainer – Trainer for the learnable document categorizer
  • DoccatEvaluator – Measures the performance of the Doccat model with the reference data
  • DoccatCrossValidator – K-fold cross validator for the learnable Document Categorizer
  • DoccatConverter – Converts 20newsgroup data format to native OpenNLP format

Language Detector

the Language Detector group has 5 members

  • LanguageDetector – Learned language detector
  • LanguageDetectorTrainer – Trainer for the learnable language detector
  • LanguageDetectorConverter – Converts leipzig data format to native OpenNLP format
  • LanguageDetectorCrossValidator – K-fold cross validator for the learnable Language Detector
  • LanguageDetectorEvaluator – Measures the performance of the Language Detector model with the reference data

The Sentence Detector

The Sentence Detector group hat 5 members

  • SentenceDetector – Learnable sentence detector
  • SentenceDetectorTrainer – Trainer for the learnable sentence detector
  • SentenceDetectorEvaluator – Measures the performance of the learnable sentence detector
  • SentenceDetectorCrossValidator – K-fold cross validator for the learnable sentence detector
  • SentenceDetectorConverter – Converts external data formats (nkjp,irishsentencebank,ad,pos,masc,conllx,namefinder,parse,moses,conllu,letsmt) to native OpenNLP format

The Name Finder

The Name Finder group has 7 members

  • TokenNameFinder – Learnable name finder
  • TokenNameFinderTrainer – Trainer for the learnable name finder
  • TokenNameFinderEvaluator – Measures the performance of the NameFinder model with the reference data
  • TokenNameFinderCrossValidator – K-fold cross validator for the learnable Name Finder
  • TokenNameFinderConverter – Converts external data formats (evalita,ad,conll03,bionlp2004,conll02,masc,muc6,ontonotes,brat) to native OpenNLP format
  • CensusDictionaryCreator – Converts 1990 US Census names into a dictionary

The POSTagger

The POSTagger group has 5 members

  • POSTagger – Learnable part of speech tagger
  • POSTaggerTrainer – Trains a model for the part-of-speech tagger
  • POSTaggerEvaluator – Measures the performance of the POS tagger model with the reference data
  • POSTaggerCrossValidator – K-fold cross validator for the learnable POS tagger
  • POSTaggerConverter – Converts external data formats (ad,masc,conllx,parse,ontonotes,conllu) to native OpenNLP format

The Lemmatizer

The Lemmatizer group has 3 members

  • LemmatizerME – Learnable lemmatizer
  • LemmatizerTrainerME – Trainer for the learnable lemmatizer
  • LemmatizerEvaluator – Measures the performance of the Lemmatizer model with the reference data

The Chunker

The Chunker Group has 5 members

  • ChunkerME – Learnable chunker
  • ChunkerTrainerME – Trainer for the learnable chunker
  • ChunkerEvaluator – Measures the performance of the Chunker model with the reference data
  • ChunkerCrossValidator – K-fold cross validator for the chunker
  • ChunkerConverter – Converts ad data format to native OpenNLP format