a name and location finder example

This is a name and location finder example for NERs (Named Entities Recognition) which is a nice way to find names, locations and other things in the text based on a trained model. A basic training might be required.

Content

Example

Training

Other NLP AI Contentserver Articles

Starting Page

Example Linguistic Features of NLP- Part1

Example Linguistic Features of NLP- Part2

The buildin NLP categorizer

Detecting the language

a name finder example

a name and location finder example

Content Server NLP application examples

Content Server Autocategorizer

Example

Let’s start with the code

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
 
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
 

public class NameFinderExample {
 
    public static void main(String[] args) {
        // find person name
    	System.out.println("Names and places find example\n");
    	String sentence = "Peter Miller is standing next to a bus stop in Houston and is waiting for John";
    	System.out.println("\nTest Sentence \n"+sentence+"\n");
        try {
            System.out.println("\nNames find\n===========");
            new NameFinderExample().findName(sentence);
            System.out.println();
        } catch (IOException e) {
            e.printStackTrace();
        }
         
        // find place
        try {
            System.out.println("Places find\n===========");
            new NameFinderExample().findLocation(sentence);
            System.out.println();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
 
    /**
     * method to find locations in the sentence
     * @throws IOException
     */
    public void findName(String sentence) throws IOException {
        InputStream is = new FileInputStream("models"+File.separator+"en-ner-person.bin");
 
        // load the model from file
        TokenNameFinderModel model = new TokenNameFinderModel(is);
        is.close();
 
        // feed the model to name finder class
        NameFinderME nameFinder = new NameFinderME(model);
 
        // input string array
        String[] testSentence =sentence.split(" "); 
 
 
        Span nameSpans[] = nameFinder.find(testSentence);
 
        // nameSpans contain all the possible entities detected
        for(Span s: nameSpans){
            System.out.print(s.toString());
            System.out.print("  :  ");
            // s.getStart() : contains the start index of possible name in the input string array
            // s.getEnd() : contains the end index of the possible name in the input string array
            for(int index=s.getStart();index<s.getEnd();index++){
                System.out.print(testSentence[index]+" ");
            }
            System.out.println();
        }
    }
     

    public void findLocation(String sentence) throws IOException {
        InputStream is = new FileInputStream("models"+File.separator+"en-ner-location.bin");
 
        // load the model from file
        TokenNameFinderModel model = new TokenNameFinderModel(is);
        is.close();
 
        // feed the model to name finder class
        NameFinderME nameFinder = new NameFinderME(model);
 
        // input string array
        String[] testSentence =sentence.split(" "); 
 

 
        Span nameSpans[] = nameFinder.find(testSentence);
 
        // nameSpans contain all the possible entities detected
        for(Span s: nameSpans){
            System.out.print(s.toString());
            System.out.print("  :  ");
            // s.getStart() : contains the start index of possible name in the input string array
            // s.getEnd() : contains the end index of the possible name in the input string array
            for(int index=s.getStart();index<s.getEnd();index++){
                System.out.print(testSentence[index]+" ");
            }
            System.out.println();
        }
    }
     
}

when you run this code, you’ll get

Names and places find example

Test Sentence

Peter Miller is standing next to a bus stop in Houston and is waiting for John

Names find

===========

[0..2) person : Peter Miller

[15..16) person : John

Places find

===========

[10..11) location : Houston

here, the test sentence is “Peter Miller is standing next to a bus stop in Houston and is waiting for John” used in both cases, the name and the loction findings.

The token “[0..2) person : Peter Miller” and “[15..16) person : John” both refer to the names “Peter Miller” and “John”.

The token “[10..11) location : Houston” refers to the location “Houston”, which was found in the test-sentence.

Training

Althoung we used a pre-trained model, let’s repeat the training,

The name/location the system is supposed to learn, is plain text and tagged like these:

My name is <START> Michael Hinterhofer <END>.

The NER is defined by <START> and <END>. (See below)

<START:named_entitiy_type>Named Entity<END> remaining sentence.

An example could be :

<START:person>Johny<END> and<START:person>Ricky<END> are brothers.

Note : If there is only one named entity type, mentioning named_entity_type is not required.

<START>Johny<END> and<START>Ricky<END> are brothers.

Multiple types could be given in a single training file.

An example for training sentence having multiple types is :

<START:person>Johny<END> and<START:person>Ricky<END> are <START:relation>brothers<END>.

The type is mentioned after the <START: tag.

See annotation examples on Github https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt

As always, a lot helps a lot. The more sentences you have, the better. For production, you shoud contain at least 15000 sentences