Linguistic Features of NLP-II

This are the more advanced linguistic features of NLP. Lets detect sentences and lets try to get the sense of the sentence.

Content

Sentence Detection

Chunks (Trying to understand a sentence)

Other NLP AI Contentserver Articles

Starting Page

Example Linguistic Features of NLP -I

The buildin NLP categorizer

Detecting the language

Content Server NLP application examples

Content Server Autocategorizer

Sentence Detection (Linguistic Features of NLP-II)

As text, the first part of an article of the Washington Post, dated mar 14/2024 with the title “Israel says it killed Hamas member in IDF strike on U.N. food center”. Lets see, how to detect the sentences. First, the example code

import java.io.File;
import java.io.FileInputStream; 
import java.io.InputStream;  

import opennlp.tools.sentdetect.SentenceDetectorME; 
import opennlp.tools.sentdetect.SentenceModel; 
import opennlp.tools.util.Span; 
   
public class SentencesAndPosDetection { 
  
   public static void main(String args[]) throws Exception { 
	  System.out.println("\nNLP Sentence detection\n");
	  System.out.println("\nText to examine\n");
	  String h = "Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024" ;
      String sen = "Israel acknowledged an airstrike on a food aid distribution center in southern Gaza,"+
                   " which it said targeted and killed a high-ranking member of Hamas."+
    		       " The militant group confirmed that its deputy head of police operations in Rafah was killed."+
                   " UNRWA, the U.N. agency for Palestinian refugees, which runs the center, "+
    		       "said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities"+
                   " as hunger grips the enclave.";
      System.out.println("Header "+h+"\n");
      System.out.println(sen);
      //Loading a sentence model 
      InputStream inputStream = new FileInputStream("models"+File.separator+"en-sent.bin"); 
      SentenceModel model = new SentenceModel(inputStream); 
       
      //Instantiating the SentenceDetectorME class 
      SentenceDetectorME detector = new SentenceDetectorME(model);  
       
      //Detecting the position of the sentences in the paragraph  
      Span[] spans = detector.sentPosDetect(sen);  
      
      //Printing the sentences and their spans of a paragraph
      System.out.println("\nThe sentences detected were\n");
      for (Span span : spans)         
         System.out.println(sen.substring(span.getStart(), span.getEnd())+" "+ span);  
   } 
} 

Iy you ran that ode, the output listed b elow. The header is not examined.

NLP Sentence detection


Text to examine

Header Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024

Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas. The militant group confirmed that its deputy head of police operations in Rafah was killed. UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities as hunger grips the enclave.

The sentences detected were

Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas. [0..150)
The militant group confirmed that its deputy head of police operations in Rafah was killed. [151..242)
UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities as hunger grips the enclave. [243..473)

So, the system found three sentences, in Position 0-150 “Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas.“, The 2nd sentence is in Position 151-242 “The militant group confirmed that its deputy head of police operations in Rafah was killed.” and the 3rd one is in Position 243-473 “UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its ‘very few remaining’ food distribution facilities as hunger grips the enclave.

Chunkers (Linguistic Features of NLP-II)

Now, lets break the sentence into groups( of words) containing sequential words of sentence, that belong to a noun group, verb group, etc. This is called Chunkers. The example code:

import java.io.File; 
import java.io.FileInputStream; 
import java.io.IOException; 
import java.io.InputStream;  

import opennlp.tools.chunker.ChunkerME; 
import opennlp.tools.chunker.ChunkerModel; 
import opennlp.tools.cmdline.postag.POSModelLoader; 
import opennlp.tools.postag.POSModel; 
import opennlp.tools.postag.POSTaggerME; 
import opennlp.tools.tokenize.WhitespaceTokenizer; 
import opennlp.tools.util.Span;  

public class ChunkerSpans{ 
   
   public static void main(String args[]) throws IOException { 
      File file = new File("models"+File.separator+"en-pos-maxent.bin");     
      POSModel model = new POSModelLoader().load(file); 
       
      POSTaggerME tagger = new POSTaggerME(model); 
  
      //Tokenizing the sentences 
      String h = "Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024" ;
      String sen = "Israel acknowledged an airstrike on a food aid distribution center in southern Gaza,"+
              " which it said targeted and killed a high-ranking member of Hamas."+
		       " The militant group confirmed that its deputy head of police operations in Rafah was killed."+
              " UNRWA, the U.N. agency for Palestinian refugees, which runs the center, "+
		       "said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities"+
              " as hunger grips the enclave.";     
	  System.out.println("\nNLP Sentence detection\n");
	  System.out.println("\nText to examine\n");
      WhitespaceTokenizer whitespaceTokenizer= WhitespaceTokenizer.INSTANCE; 
      String[] tokens = whitespaceTokenizer.tokenize(sen); 
       
      //Generating tags from the tokens 
      String[] tags = tagger.tag(tokens);       
   
      InputStream inputStream = new 
         FileInputStream("models"+File.separator+"en-chunker.bin"); 
      ChunkerModel chunkerModel = new ChunkerModel(inputStream);
      ChunkerME chunkerME = new ChunkerME(chunkerModel);       
          System.out.println("Header "+h+"\n");
      System.out.println(sen);     
      Span[] span = chunkerME.chunkAsSpans(tokens, tags); 
       
      for (Span s : span) 
         System.out.println(s.toString());  
   }    
}

If you run this program, you’ll see


NLP Sentence detection


Text to examine

Header Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024

Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas. The militant group confirmed that its deputy head of police operations in Rafah was killed. UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities as hunger grips the enclave.
[0..1) NP
[1..2) VP
[2..4) NP
[4..5) PP
[5..10) NP
[10..11) PP
[11..13) NP
[13..14) NP
[14..15) NP

 (Output cut)

As before, we used the Washington Post Article.

We shall represent a part of the output in a table, and mention the chunks in the last column. Due to the large output, the list in limited to the first sentence. The tag set used by the english pos model is theĀ Penn Treebank tag set.

TokenChunk ID
IsraelNPFirst chunk (Noun)
acknowledgedNP
anVPSecond chunk (Verb)
airstrikeVP
onNPThirth Chunk (Noun)
aNP
foodNPStart of fifth junk
aidNP

This is a really nice feature to understand the sentence