This is a name and location finder example for NERs (Named Entities Recognition) which is a nice way to find names, locations and other things in the text based on a trained model. A basic training might be required.
Content
Other NLP AI Contentserver Articles
Example Linguistic Features of NLP- Part1
Example Linguistic Features of NLP- Part2
a name and location finder example
Content Server NLP application examples
Content Server Autocategorizer
Example
Let’s start with the code
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
public class NameFinderExample {
public static void main(String[] args) {
// find person name
System.out.println("Names and places find example\n");
String sentence = "Peter Miller is standing next to a bus stop in Houston and is waiting for John";
System.out.println("\nTest Sentence \n"+sentence+"\n");
try {
System.out.println("\nNames find\n===========");
new NameFinderExample().findName(sentence);
System.out.println();
} catch (IOException e) {
e.printStackTrace();
}
// find place
try {
System.out.println("Places find\n===========");
new NameFinderExample().findLocation(sentence);
System.out.println();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* method to find locations in the sentence
* @throws IOException
*/
public void findName(String sentence) throws IOException {
InputStream is = new FileInputStream("models"+File.separator+"en-ner-person.bin");
// load the model from file
TokenNameFinderModel model = new TokenNameFinderModel(is);
is.close();
// feed the model to name finder class
NameFinderME nameFinder = new NameFinderME(model);
// input string array
String[] testSentence =sentence.split(" ");
Span nameSpans[] = nameFinder.find(testSentence);
// nameSpans contain all the possible entities detected
for(Span s: nameSpans){
System.out.print(s.toString());
System.out.print(" : ");
// s.getStart() : contains the start index of possible name in the input string array
// s.getEnd() : contains the end index of the possible name in the input string array
for(int index=s.getStart();index<s.getEnd();index++){
System.out.print(testSentence[index]+" ");
}
System.out.println();
}
}
public void findLocation(String sentence) throws IOException {
InputStream is = new FileInputStream("models"+File.separator+"en-ner-location.bin");
// load the model from file
TokenNameFinderModel model = new TokenNameFinderModel(is);
is.close();
// feed the model to name finder class
NameFinderME nameFinder = new NameFinderME(model);
// input string array
String[] testSentence =sentence.split(" ");
Span nameSpans[] = nameFinder.find(testSentence);
// nameSpans contain all the possible entities detected
for(Span s: nameSpans){
System.out.print(s.toString());
System.out.print(" : ");
// s.getStart() : contains the start index of possible name in the input string array
// s.getEnd() : contains the end index of the possible name in the input string array
for(int index=s.getStart();index<s.getEnd();index++){
System.out.print(testSentence[index]+" ");
}
System.out.println();
}
}
}
when you run this code, you’ll get
Names and places find example
Test Sentence
Peter Miller is standing next to a bus stop in Houston and is waiting for John
Names find
===========
[0..2) person : Peter Miller
[15..16) person : John
Places find
===========
[10..11) location : Houston
here, the test sentence is “Peter Miller is standing next to a bus stop in Houston and is waiting for John” used in both cases, the name and the loction findings.
The token “[0..2) person : Peter Miller” and “[15..16) person : John” both refer to the names “Peter Miller” and “John”.
The token “[10..11) location : Houston” refers to the location “Houston”, which was found in the test-sentence.
Training
Althoung we used a pre-trained model, let’s repeat the training,
The name/location the system is supposed to learn, is plain text and tagged like these:
My name is <START> Michael Hinterhofer <END>.
The NER is defined by <START> and <END>. (See below)
<START:named_entitiy_type>Named Entity<END> remaining sentence.
An example could be :
<START:person>Johny<END> and<START:person>Ricky<END> are brothers.
Note : If there is only one named entity type, mentioning named_entity_type is not required.
<START>Johny<END> and<START>Ricky<END> are brothers.
Multiple types could be given in a single training file.
An example for training sentence having multiple types is :
<START:person>Johny<END> and<START:person>Ricky<END> are <START:relation>brothers<END>.
The type is mentioned after the <START: tag.
See annotation examples on Github https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt
As always, a lot helps a lot. The more sentences you have, the better. For production, you shoud contain at least 15000 sentences