This are the more advanced linguistic features of NLP. Lets detect sentences and lets try to get the sense of the sentence.
Content
Chunks (Trying to understand a sentence)
Other NLP AI Contentserver Articles
Example Linguistic Features of NLP -I
Content Server NLP application examples
Content Server Autocategorizer
Sentence Detection (Linguistic Features of NLP-II)
As text, the first part of an article of the Washington Post, dated mar 14/2024 with the title “Israel says it killed Hamas member in IDF strike on U.N. food center”. Lets see, how to detect the sentences. First, the example code
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.util.Span;
public class SentencesAndPosDetection {
public static void main(String args[]) throws Exception {
System.out.println("\nNLP Sentence detection\n");
System.out.println("\nText to examine\n");
String h = "Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024" ;
String sen = "Israel acknowledged an airstrike on a food aid distribution center in southern Gaza,"+
" which it said targeted and killed a high-ranking member of Hamas."+
" The militant group confirmed that its deputy head of police operations in Rafah was killed."+
" UNRWA, the U.N. agency for Palestinian refugees, which runs the center, "+
"said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities"+
" as hunger grips the enclave.";
System.out.println("Header "+h+"\n");
System.out.println(sen);
//Loading a sentence model
InputStream inputStream = new FileInputStream("models"+File.separator+"en-sent.bin");
SentenceModel model = new SentenceModel(inputStream);
//Instantiating the SentenceDetectorME class
SentenceDetectorME detector = new SentenceDetectorME(model);
//Detecting the position of the sentences in the paragraph
Span[] spans = detector.sentPosDetect(sen);
//Printing the sentences and their spans of a paragraph
System.out.println("\nThe sentences detected were\n");
for (Span span : spans)
System.out.println(sen.substring(span.getStart(), span.getEnd())+" "+ span);
}
}
Iy you ran that ode, the output listed b elow. The header is not examined.
NLP Sentence detection
Text to examine
Header Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024
Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas. The militant group confirmed that its deputy head of police operations in Rafah was killed. UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities as hunger grips the enclave.
The sentences detected were
Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas. [0..150)
The militant group confirmed that its deputy head of police operations in Rafah was killed. [151..242)
UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities as hunger grips the enclave. [243..473)
So, the system found three sentences, in Position 0-150 “Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas.“, The 2nd sentence is in Position 151-242 “The militant group confirmed that its deputy head of police operations in Rafah was killed.” and the 3rd one is in Position 243-473 “UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its ‘very few remaining’ food distribution facilities as hunger grips the enclave.“
Chunkers (Linguistic Features of NLP-II)
Now, lets break the sentence into groups( of words) containing sequential words of sentence, that belong to a noun group, verb group, etc. This is called Chunkers. The example code:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.Span;
public class ChunkerSpans{
public static void main(String args[]) throws IOException {
File file = new File("models"+File.separator+"en-pos-maxent.bin");
POSModel model = new POSModelLoader().load(file);
POSTaggerME tagger = new POSTaggerME(model);
//Tokenizing the sentences
String h = "Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024" ;
String sen = "Israel acknowledged an airstrike on a food aid distribution center in southern Gaza,"+
" which it said targeted and killed a high-ranking member of Hamas."+
" The militant group confirmed that its deputy head of police operations in Rafah was killed."+
" UNRWA, the U.N. agency for Palestinian refugees, which runs the center, "+
"said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities"+
" as hunger grips the enclave.";
System.out.println("\nNLP Sentence detection\n");
System.out.println("\nText to examine\n");
WhitespaceTokenizer whitespaceTokenizer= WhitespaceTokenizer.INSTANCE;
String[] tokens = whitespaceTokenizer.tokenize(sen);
//Generating tags from the tokens
String[] tags = tagger.tag(tokens);
InputStream inputStream = new
FileInputStream("models"+File.separator+"en-chunker.bin");
ChunkerModel chunkerModel = new ChunkerModel(inputStream);
ChunkerME chunkerME = new ChunkerME(chunkerModel);
System.out.println("Header "+h+"\n");
System.out.println(sen);
Span[] span = chunkerME.chunkAsSpans(tokens, tags);
for (Span s : span)
System.out.println(s.toString());
}
}
If you run this program, you’ll see
NLP Sentence detection
Text to examine
Header Israel says it killed Hamas member in IDF strike on U.N. food center Washington Post Mar 14-2024
Israel acknowledged an airstrike on a food aid distribution center in southern Gaza, which it said targeted and killed a high-ranking member of Hamas. The militant group confirmed that its deputy head of police operations in Rafah was killed. UNRWA, the U.N. agency for Palestinian refugees, which runs the center, said one of its staff members was killed and condemned the attack on one of its 'very few remaining' food distribution facilities as hunger grips the enclave.
[0..1) NP
[1..2) VP
[2..4) NP
[4..5) PP
[5..10) NP
[10..11) PP
[11..13) NP
[13..14) NP
[14..15) NP
(Output cut)
As before, we used the Washington Post Article.
We shall represent a part of the output in a table, and mention the chunks in the last column. Due to the large output, the list in limited to the first sentence. The tag set used by the english pos model is theĀ Penn Treebank tag set.
Token | Chunk ID | |
Israel | NP | First chunk (Noun) |
acknowledged | NP | |
an | VP | Second chunk (Verb) |
airstrike | VP | |
on | NP | Thirth Chunk (Noun) |
a | NP | |
food | NP | Start of fifth junk |
aid | NP |
This is a really nice feature to understand the sentence