openNLP vs Spacy for Contentserver

openNLP vs Spacy for Contentserver is the first part of a comparism between this two AI NLP packages in the Content Server environment.

Goto the start opennlp series of acticles: Starting Page

Spacy

openNLP vs Spacy for Contentserver Part 1

FeatureopenNLPSpacy
Programming-LanguageJava, at least  JDK 17Python at least 3.1, Python Environment must be installed, like
Connect to the Content ServerJVM inside of the Content Server or REST APIREST API, restricted use with Content Server
Open DocumentsApache TIKA as frontend processorApache TIKA as frontend processor, REST or batch connection
Supported Languages (ISO Language Codes)fr, de, en, it, nl,da, es, pt,se as pretrained models
ca,zh, hr, da, nl, en, fi, fr, de, el, it, ja, ko, lt, mk, nb, pl, pt, ro, ru, sl, es, sv, uk, af, sq, am, grc, ar, eu, bn, bg, cs, et, fo, gu, he, hi, hu, is, id, ga, hn, ki, la, lv, lij, dsb, lg, ms, ml, mr, ne, nn, fa, sa, sr , tn, si, sk, tl, ta, tt, te, hh, ti, tr, hsb, ur, vi, yo as pretrained models
Trainable Languagesyesyes
Word Vectorsexperimentalin large models supported

Installing Spacy

Spacy

We had an overview of Spacy, which you’ll find here. Today, we shall discuss the installation of Spacy.

So without further redo, lets dive in the installation, shall we?

Content

Installation

Step 1: Select the configuation

Step 2: Download

Visual Studio Code extension

The main installation page is https://spacy.io/usage

Installation

Install a Python Environmen as prerequisite. Use a newer Python.

Remark: I use Anaconda as free environment.

Step 1: Select the configuation

Select operating system, platform and the pre-trained models.

For example, if you chose to download a windows version for x86 processors and you have a modern graphics card (like the NVIDIA A4000), then you can select GPU with CUDA 11.2-11x option. Here in this picture,the languages English, French, German and Romanian are selected

Step 2: Download

Your configuration will result in a couple of command line entries. You should copy/paste this in your environment.

Remark: I use Anaconda as free environment.

After having executed the pasted lines, they download the components elected.

Visual Studio Code extension

there is also a Visual Studio code extension for Spacy, which can be used as IDE-

It can be found at https://marketplace.visualstudio.com/items?itemName=Explosion.spacy-extension

Spacy

Spacy

Spacy is an Python based popular Open Source AI – NLP (natural language processing) package for 75+ languages including supporting Word Vectors. Spacy supports also new graphics processors.

Content

How to find it?

Package naming conventions

Capabilities

POS Tagging

Morphology

Lemmatization

Dependency Parsing

Named Entity Recognition

Word vectors and semantic similarity

Goto the start opennlp series of acticles: Starting Page

Spacy vs openNLP Part 1 Pros and Cons

Spacy

How to find it?

The website is spaCy · Industrial-strength Natural Language Processing in Python.

It has also several downloadable pretrained models, in these languages

Package naming conventions

In general, spaCy uses for all pipeline packages to follow the naming convention of [lang]_[name]. For spaCy’s pipelines, there is also chose to divide of the name into three components:

  1.  Type: Capabilities (e.g. core for general-purpose pipeline with tagging, parsing, lemmatization and named entity recognition, or dep for only tagging, parsing and lemmatization).
  2.  Genre: Type of text the pipeline is trained on, e.g. web or news.
  3.  Size: Package size indicator, smmdlg or trf
    sm and trf pipelines have no static word vectors. For pipelines with default vectors, md has a reduced word vector table with 20k unique vectors for ~500k words and lg has a large word vector table with ~500k entries. For pipelines with floret vectors, md vector tables have 50k entries and lg vector tables have 200k entries.

Example en_core_web_md

For example, en_core_web_md is a medium English model trained on written text , that includes vocabulary, syntax and entities.

The larger models have the word vertors buildin.

SIZEMD 40 MB
COMPONENTS tok2vectaggerparsersenterattribute_rulerlemmatizerner
PIPELINE tok2vectaggerparserattribute_rulerlemmatizerner
VECTORS 514k keys, 20k unique vectors (300 dimensions)
DOWNLOAD LINK en_core_web_md-3.7.1-py3-none-any.whl
SOURCES OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)
ClearNLP Constituent-to-Dependency Conversion (Emory University)
WordNet 3.0 (Princeton University)
Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl) (Explosion)

short summary of the linguistic Capabilities

POS Tagging

After tokenization, spaCy can parse and tag a given Doc. This is where the trained pipeline and its statistical models come in, which enable spaCy to make predictions of which tag or label most likely applies in this context. A trained component includes binary data that is produced by showing a system enough examples for it to make predictions that generalize across the language – for example, a word following “the” in English is most likely a noun.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("IBM is looking at buying U.K. startup for $1 billion")

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop)

If you run that Python code, you'll get
IBM IBM PROPN NNP nsubj XXX True False
is be AUX VBZ aux xx True True
looking look VERB VBG ROOT xxxx True False
at at ADP IN prep xx True True
buying buy VERB VBG pcomp xxxx True False
U.K. U.K. PROPN NNP dobj X.X. False False
startup startup NOUN NN dep xxxx True False
for for ADP IN prep xxx True True
$ $ SYM $ quantmod $ False False
1 1 NUM CD compound d False False
billion billion NUM CD pobj xxxx True False

this means

TEXTLEMMAPOSTAGDEPSHAPEALPHASTOP
IBMibmPROPNNNPnsubjXxxxxTrueFalse
isbeAUXVBZauxxxTrueTrue
lookinglookVERBVBGROOTxxxxTrueFalse
atatADPINprepxxTrueTrue
buyingbuyVERBVBGpcompxxxxTrueFalse
U.K.u.k.PROPNNNPcompoundX.X.FalseFalse
startupstartupNOUNNNdobjxxxxTrueFalse
forforADPINprepxxxTrueTrue
$$SYM$quantmod$FalseFalse
11NUMCDcompounddFalseFalse
billionbillionNUMCDpobjxxxxTrueFalse

If you use one of the spacy visualizers, you’ll get this image

Nice, isn’t it?

Goto TOP

Morphology

Inflectional morphology is the process by which a root form of a word is modified by adding prefixes or suffixes that specify its grammatical function but do not change its part-of-speech. We say that a lemma (root form) is inflected (modified/combined) with one or more morphological features to create a surface form. Here are some examples:

CONTEXTSURFACELEMMAPOSMORPHOLOGICAL FEATURES
I was reading the paperreadingreadVERBVerbForm=Ger
I don’t watch the news, I read the paperreadreadVERBVerbForm=FinMood=IndTense=Pres
I read the paper yesterdayreadreadVERBVerbForm=FinMood=IndTense=Past

import spacy

nlp = spacy.load(“en_core_web_sm”)
print(“Pipeline:”, nlp.pipe_names)
doc = nlp(“I am going to dinner”)
token = doc[0] # ‘I’
print(token.morph) # ‘Case=Nom|Number=Sing|Person=1|PronType=Prs’
print(token.morph.get(“PronType”)) # [‘Prs’]

if you run this code, you’ll get

Case=Nom|Number=Sing|Person=1|PronType=Prs

['Prs']


Goto TOP

Lemmatization

as always, a Lemmatizer takes the word into its basic form

import spacy

# English pipelines include a rule-based lemmatizer
nlp = spacy.load("en_core_web_sm")
lemmatizer = nlp.get_pipe("lemmatizer")
print(lemmatizer.mode) # 'rule'

doc = nlp("I was taking the paper.")
print([token.lemma_ for token in doc])

if you run this code, you’ll get

rule
['I', 'be', 'take', 'the', 'paper', '.']

Goto TOP

Dependency Parsing

spaCy features a syntactic dependency parser, and has an API for navigating the tree. The parser also powers the sentence boundary detection, and lets you iterate over base noun phrases, or “chunks”.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Credit and mortgage account holders must submit their requests")
span = doc[doc[4].left_edge.i : doc[4].right_edge.i+1]
with doc.retokenize() as retokenizer:
    retokenizer.merge(span)
for token in doc:
    print(token.text, token.pos_, token.dep_, token.head.text)

if you’ll run this code, you’ll get

C:\Users\merz\AppData\Local\anaconda3\python.exe K:\cspython\src\REST\test.py 
Credit nmod 0 2 ['account', 'holders', 'submit']
and cc 0 0 ['Credit', 'account', 'holders', 'submit']
mortgage conj 0 0 ['Credit', 'account', 'holders', 'submit']
account compound 1 0 ['holders', 'submit']
holders nsubj 1 0 ['submit']
['credit', 'and', 'mortgage', 'account', 'holder', 'should', 'submit', 'their', 'request']


TEXT	DEP	N_LEFTS	N_RIGHTS	ANCESTORS
Credit	nmod	0	2	holders, submit
and	cc	0	0	holders, submit
mortgage	compound	0	0	account, Credit, holders, submit
account	conj	1	0	Credit, holders, submit
holders	nsubj	1	0	submit

Finally, the .left_edge and .right_edge attributes can be especially useful, because they give you the first and last token of the subtree. This is the easiest way to create a Span object for a syntactic phrase. Note that .right_edge gives a token within the subtree – so if you use it as the end-point of a range, don’t forget to +1!

Goto TOP

Named Entity Recognition

spaCy has an fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default trained pipelines can identify a variety of named and numeric entities, including companies, locations, organizations and products. You can add arbitrary classes to the entity recognition system, and update the model with new examples.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("San Francisco considers banning delivery robots")

# document level
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print(ents)

# token level
ent_san = [doc[0].text, doc[0].ent_iob_, doc[0].ent_type_]
ent_francisco = [doc[1].text, doc[1].ent_iob_, doc[1].ent_type_]
print(ent_san)  # ['San', 'B', 'GPE']
print(ent_francisco)  # ['Francisco', 'I', 'GPE']

if you’ll run that code, you’ll get

C:\Users\merz\AppData\Local\anaconda3\python.exe K:\cspython\src\REST\test.py 
[('San Francisco', 0, 13, 'GPE')]
['San', 'B', 'GPE']
['Francisco', 'I', 'GPE']

If you use the builtin visualizer of spacy to visualizte the NER, you’ll get this:

Goto TOP

Word vectors and semantic similarity

import spacy

nlp = spacy.load("de_core_news_lg")
tokens = nlp("dog cat banane afskfsd")

for token in tokens:
    print(token.text, token.has_vector, token.vector_norm, token.is_oov)

if run run that code, you’ll get this

C:\Users\merz\AppData\Local\anaconda3\python.exe K:\cspython\src\REST\test.py 
hund True 45.556004 False
katze True 40.768963 False
banane True 22.76727 False
afskfsd False 0.0 True

The words “hund” (in German its a “dog”) , “katze” (in German its a “cat”) and “banane” (in German its a “banana”) are all pretty common in German, so they’re part of the pipeline’s vocabulary, and come with a vector. The word “afskfsd” on the other hand is a lot less common and out-of-vocabulary – so its vector representation consists of 300 dimensions of 0, which means it’s practically nonexistent

spaCy can compare two objects, and make a prediction of how similar they are. Predicting similarity is useful for building recommendation systems or flagging duplicates. For example, you can suggest a user content that’s similar to what they’re currently looking at, or label a support ticket as a duplicate if it’s very similar to an already existing one.

Goto TOP

Apache Tika

Apache Tika

Apache Tika – Open and extract text from virtually all formats. Its called a “content analysis toolkit”

Its free like openNLP and its used to extract text from virtually all file formats possible.

Content

Using Apache Tika as a command line utility

Apache Tika Example: Extracting MS Office

Goto the start opennlp series of acticles: Starting Page

The complete configuration of our AI Content Server extension is like this:

Apache Tika and openNLP

You can get TIKA at https://tika.apache.org/ . How to install is explained on this URL

A nice overview of TIKA is here https://www.tutorialspoint.com/tika/tika_overview.htm

A very nice book on TIKA at Manning https://www.manning.com/books/tika-in-action

Apache Tika

Using Apache Tika as a command line utility

The basic usage (w/o programming) is a command line utility. Its used like

usage: java -jar tika-app.jar [option...] [file|port...]

Options:
    -?  or --help          Print this usage message
    -v  or --verbose       Print debug level messages
    -V  or --version       Print the Apache Tika version number

    -g  or --gui           Start the Apache Tika GUI
    -s  or --server        Start the Apache Tika server
    -f  or --fork          Use Fork Mode for out-of-process extraction

    --config=<tika-config.xml>
        TikaConfig file. Must be specified before -g, -s, -f or the dump-x-config !
    --dump-minimal-config  Print minimal TikaConfig
    --dump-current-config  Print current TikaConfig
    --dump-static-config   Print static config
    --dump-static-full-config  Print static explicit config

    -x  or --xml           Output XHTML content (default)
    -h  or --html          Output HTML content
    -t  or --text          Output plain text content
    -T  or --text-main     Output plain text content (main content only)
    -m  or --metadata      Output only metadata
    -j  or --json          Output metadata in JSON
    -y  or --xmp           Output metadata in XMP
    -J  or --jsonRecursive Output metadata and content from all
                           embedded files (choose content type
                           with -x, -h, -t or -m; default is -x)
    -l  or --language      Output only language
    -d  or --detect        Detect document type
           --digest=X      Include digest X (md2, md5, sha1,
                               sha256, sha384, sha512
    -eX or --encoding=X    Use output encoding X
    -pX or --password=X    Use document password X
    -z  or --extract       Extract all attachements into current directory
    --extract-dir=<dir>    Specify target directory for -z
    -r  or --pretty-print  For JSON, XML and XHTML outputs, adds newlines and
                           whitespace, for better readability

    --list-parsers
         List the available document parsers
    --list-parser-details
         List the available document parsers and their supported mime types
    --list-parser-details-apt
         List the available document parsers and their supported mime types in apt format.
    --list-detectors
         List the available document detectors
    --list-met-models
         List the available metadata models, and their supported keys
    --list-supported-types
         List all known media types and related information


    --compare-file-magic=<dir>
         Compares Tika's known media types to the File(1) tool's magic directory

Description:
    Apache Tika will parse the file(s) specified on the
    command line and output the extracted text content
    or metadata to standard output.

    Instead of a file name you can also specify the URL
    of a document to be parsed.

    If no file name or URL is specified (or the special
    name "-" is used), then the standard input stream
    is parsed. If no arguments were given and no input
    data is available, the GUI is started instead.

- GUI mode

    Use the "--gui" (or "-g") option to start the
    Apache Tika GUI. You can drag and drop files from
    a normal file explorer to the GUI window to extract
    text content and metadata from the files.

- Batch mode

    Simplest method.
    Specify two directories as args with no other args:
         java -jar tika-app.jar <inputDirectory> <outputDirectory>


Batch Options:
    -i  or --inputDir          Input directory
    -o  or --outputDir         Output directory
    -numConsumers              Number of processing threads
    -bc                        Batch config file
    -maxRestarts               Maximum number of times the
                               watchdog process will restart the child process.
    -timeoutThresholdMillis    Number of milliseconds allowed to a parse
                               before the process is killed and restarted
    -fileList                  List of files to process, with
                               paths relative to the input directory
    -includeFilePat            Regular expression to determine which
                               files to process, e.g. "(?i)\.pdf"
    -excludeFilePat            Regular expression to determine which
                               files to avoid processing, e.g. "(?i)\.pdf"
    -maxFileSizeBytes          Skip files longer than this value

    Control the type of output with -x, -h, -t and/or -J.

    To modify child process jvm args, prepend "J" as in:
    -JXmx4g or -JDlog4j.configuration=file:log4j.xml.

This stand alone jar is the easiest way to use TIKA.

Apache Tika Example: Extracting MS Office

This is an example of using TIKA with Excel (metadata)

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
import org.apache.tika.sax.BodyContentHandler;

import org.xml.sax.SAXException;

public class MSxcelParse {

   public static void main(final String[] args) throws IOException, TikaException {
      
      //detecting the file type
      BodyContentHandler handler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new File("example_msExcel.xlsx"));
      ParseContext pcontext = new ParseContext();
      
      //OOXml parser
      OOXMLParser  msofficeparser = new OOXMLParser (); 
      msofficeparser.parse(inputstream, handler, metadata,pcontext);
      System.out.println("Contents of the document:" + handler.toString());
      System.out.println("Metadata of the document:");
      String[] metadataNames = metadata.names();
      
      for(String name : metadataNames) {
         System.out.println(name + ": " + metadata.get(name));
      }
   }
}

When you run this program, you’ll see this output

Contents of the document:

Sheet1
Name	Age	Designation		Salary
Ramu	50	Manager			50,000
Raheem	40	Assistant manager	40,000
Robert	30	Superviser		30,000
sita	25	Clerk			25,000
sameer	25	Section in-charge	20,000

Metadata of the document:

meta:creation-date:    2006-09-16T00:00:00Z
dcterms:modified:    2014-09-28T15:18:41Z
meta:save-date:    2014-09-28T15:18:41Z
Application-Name:    Microsoft Excel
extended-properties:Company:    
dcterms:created:    2006-09-16T00:00:00Z
Last-Modified:    2014-09-28T15:18:41Z
Application-Version:    15.0300
date:    2014-09-28T15:18:41Z
publisher:    
modified:    2014-09-28T15:18:41Z
Creation-Date:    2006-09-16T00:00:00Z
extended-properties:AppVersion:    15.0300
protected:    false
dc:publisher:    
extended-properties:Application:    Microsoft Excel
Content-Type:    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Last-Save-Date:    2014-09-28T15:18:41Z

Please refer to the book or to the pages from Apache for more.

The new Install Modules Page

The new Install Modules page

In the versions prior to 21.4 the module version numbers tented to be confusing. To fix this, there is a changed “Install Modules” page.

The new Install Modules page

This page has been redesigned to avoid confusion.

The standard modules appear in a separate secsion without version numbers. And the somewhat long names (like “Content Server Comments” are simplified to “Comments”.

The optional modules are listed in the “Installed Modules” section. They now feature “Build” instead of “Version” to clarify the meaning.

Have much fun with the new “Install Modules” page.

Rethinking smartUI Part 4-B

Forms Rest command

Some weeks ago I published a new video on Rethinking smartUI on Youtube. Now we have Rethinking smartUI Part 4-B discussing the main part of gathering and displaying thr documents data.

If you havn’t it seen yet, here is the short video. In this posts, we’ll go though the technical aspects of this new kind of approach to smartUI. This demo is a Document Pad (or short DocPad), displaying all document properties in a SCIFI GUI arrangement.

A warning at the beginning: To use this code with IE11 is a perfect way to test all js error messages. Use this code only with the newest browsers. Its tested with Chrome (V98), Chrome Canary (V98), FF Developer (95.0b4), Edge (95.0) and Firefox (93.0)

The other parts of this post were

Part 4-A The Javascript Part 1

Part 3 The Infrastructure and the CSS

Part 2 The HTML

Part 1 Overview

In the part 4A, we had discussed all the js responsible for the perimeter of the whdgetz. Not lets discuss the main part which is responsible to gather and displad this data:

docdisplay

A CSS Grid

As you can see, there are 6 panels arranged in a CSS grid.

For infos on the css, please review this post, the part 3 of this series.

So let’s start with the panel at top left.

The documents metadata

This is more or less the data which is related directly to the document. The documents node number was the output from the node picker . The nodepicker was closed by the done() callback.

Nodepicker

Here we are in the this function of the nodepicker. We extract the node from the callbacks arguments an get the id with the topmost arrow. We extract the name of the node and put this name inside the id #document.

The loadDescriptions function does the work.

loadDescriptions

The prelude is simply to select the first face “.face.one”

Prelude
Begin

If this is not undefined (remember, smartUI always makes 2 runs, so its always a goot idea to test if its defined) the create and modify dates are extracted and translated in a standard js data. For non-US readers it will be always a difference between p.ex 04-05-20 and 4.Mai 2020 (US and german dates for the Star Wars day May the fourth), thats why we translate the dates.

Also we need to get the users of the creation and the modification. But these are numbers, so we want to translate them to names.

Next, extract the server from the connection and construct the members REST command to get these names.

First view: The fetch command

fetch is new in js 6. In this older, antique times you would have used some ajax variants like xmlhttprequest or some similar methods which we will use in other calls for comparism.

Fetch command

Technically, we have to issue two REST calls to /member/ to get the names of the createuser and the modifyuser. We use the fetch command.

Remark: the famous async/await would be much more handy for that, but we wanted to limit the language scope to js6 for these posts.

Once we get the responses, we’ll put that names simply as innerHTML on the panel.

Technically, you can use all other avaliable methods to put text on the panel, from template-strings to create a and fill a text nodein the DOM. You can even invite handlebars to do this for you.

loadDocumentThumbnail

In the top middle panel we added the document thumbnail, which is created automatically during indexing on the server.

Thumbnail

We must enter the nodeid in the REST command /thumbnails/medium/content to get the medium resolution version of the thumbnail.

To show the diffence to the fetch comand, the old archaic XMLHttpRequest was used.

The receiving image is put into a div with the id “Thumbnail”.

Image Correction

In the case the user selects another document the old thumbnail would remain. So we remove the old image element.

Almost done, we need to put our otcsticket in the request header and to send the request to the server.

loadNodeData

In this function, we use exactly one REST call to get all data at once. This is done by the function /forms/update?id=xx whick will deliver all data for that nodeid at once. Expecially the categories take a while, so a css-fog was used to cloak the image of the approaching grid until the data was received (revisit the video). Then the css fog is cleared and trhe categories are displayed.

The call is also done with the old XMLHttpRequest to show the diffences to the modern fetch command.

Local functions were used instead of those in “this” to keep the scope clean.

forms/update

The categories and the attributes

Categories

Categories were returned in an object with their category name a title in the entry. To get the attributes we have to do a little bit more.

Attributes

We split the result into several arrays to extract the values. If we have “date” in the type field, we have to use our date translation also on that to display the dates correct.

Security Clearances

All security related data is on the fouth face, the one on the lower left.

Security Clearances

Here, all security levels and markings were displayed inside a span.

Records Management Data

here we extract and fill the data on the lower middle panel.

Records Management

The Versions Data

Here, the REST commend has a problem. Versions are not included in the answer of the REST command, at least in the Content Server versions 21.3 and 21.4. So let’s ionform the user on this fact and display a local language string of this fact.

Tip: Maybe there will be a patch to fix this in the near future.

Versions Data

So we had all parts discussed.

We offer a one day remote training to understand the javascript code. If you are already a sophisticated Javascript Developer, you can get the free Sources also from https://github.com/ReinerMerz/reinerdemo (a public repository on Github).

Warning: This is only the sourcetree of the project, so you have to insert this in your own project file.

The data returned from the formsupdate?id=nn REST command

The whole data structure is send back in response to a forms/update?id=nnn REST call. Some of these entries take quite a while, so try using some css to cache this.

Forms Rest command

Have fun on discovering the endless possibilities of Dashboards and other Contentserver smartUI extensions using javascript6 and css3.

The sky is the limit.

Rethinking smartUI Part 4-A

Rethinking smartUI

Three weeks ago I published a new video on Rethinking smartUI on Youtube. Now we have Rethinking smartUI Part 4-A discussing Javascript. If you havn’t it seen yet, here is the short video. In this posts, we’ll go though the technical aspects of this new kind of approach to smartUI. This demo is a Document Pad (or short DocPad), displaying all document properties in a SCIFI GUI arrangement.

This is Part 4-A of a multipart post covering Javascript of DocPad. For the css refer to part 3 of Rethinking smartUI.

Warning: This is a lot of js code, so I subdivided this Part 4 into Part 4-A and 4-B (getting all documents infos with 1 Rest call). Part 4-B will be published next week.

We offer a one day remote training to understand the javascript code. If you are already a sophisticated Javascript Developer, you can get the Sources also from https://github.com/ReinerMerz/reinerdemo (a public repository on Github).

Warning: This is only the sourcetree of the project, so you have to insert this in your own project file.

The post parts until now

Part 4-A The Javascript Part 1

Part 3 The Infrastructure and the CSS

Part 2 The HTML

Part 1 Overview

But first to Rethinking smartUI Part 4-A.

The Javascript auxiliary files

Using the model factory

The idea of a model factory is to combine several functions in one.

a. Create a model and return it.

b. Load server data into this model for use in the main constructor function

A typical factory looks like this:

// the factory for the model
define([
‘csui/utils/contexts/factories/factory’, // Factory base to inherit from
‘csui/utils/contexts/factories/connector’, // Factory for the server connector
‘ademo/widgets/docpad/impl/docpad.model’ // Model to create the factory for
], function (ModelFactory, ConnectorFactory, DocpadModel) {

‘use strict’;
var DocpadModelFactory = ModelFactory.extend({
propertyPrefix: ‘docpad’,
constructor: function DocpadModelFactory(context, options) {
ModelFactory.prototype.constructor.apply(this, arguments);
var connector = context.getObject(ConnectorFactory, options);
// Expose the model instance in the property key on this factory instance to be // used by the context
this.property = new DocpadModel(undefined, {connector: connector});
},
fetch: function (options) {
return this.property.fetch(options);
}
});
return DocpadModelFactory;
});

The main Javascript files

The lang.js file(s)

The language system consists of those strings which are pupposed to change per language.

define({
helloMessage: ‘Hello {0} {1}!’,
waitMessage: ‘Please wait’,
seldoc: ‘Select a Document’,
printdoc: ‘Print the selected Document’,
pickerTitle: ‘Select a Document’,
selectPickerButtonLabel: ‘Select’,
descNodeID: ‘Node-ID’,
desDescription: ‘Description’,
desCreated: ‘Created’,
desModified: ‘Modified’,
……more …..
});

The standard model from Backbone.js

define([
‘csui/lib/backbone’,
‘csui/utils/url’
], function (Backbone, Url) {
‘use strict’;

let DocpadModel = Backbone.Model.extend({
defaults: {
name: ‘Unnamed’
},

constructor: function DocpadModel(attributes, options) {
Backbone.Model.prototype.constructor.apply(this, arguments);

// Enable this model for communication with the CS REST API
if (options && options.connector) {
options.connector.assignTo(this);
}
},
// Computes the REST API URL using the connection options
// /auth returns information about the authenticated user
// usage of => not possible because of “this” is used for urls
url: function () {
return Url.combine(this.connector.connection.url, ‘/auth’);
} ,

parse: (response) => response.data

}];

return DocpadModel;

}];

 

Remark: The Arrow Function at parse is pure js6. But let me quote developer.mozilla.com here

An arrow function expression is a compact alternative to a traditional function expression, but is limited and can’t be used in all situations.

Differences & Limitations:

(Endquote)

This means, its not possibe to use when any limitation will take place, like in the constructor.

The docpad.view.js

The stages of docpad.view.js

Get infos via the Bootstrap model

constructor: function DocpadView(options) {
options.model = options.context.getModel(DocpadModelFactory);
Marionette.ItemView.prototype.constructor.call(this, options);
this.listenTo(this.model, ‘change’, this.render);
}

First, we initiate the model (and read the data) via a standard model using the model factory. Then we call the constructor in the prototype with the cs options. Lastly we listen to a change of the data and (in case) we rerender the widget.

Load the photo

The user data contains (eventually) a photo of the user. This is loaded and inserted in the DOM with the function loadPhoto. Here the js6+ fetch function is exposed to get the photo and insert it into after the element woth the .photo class in the dom.

loadPhoto: function () {
let server = this.model.connector.connection.url;
server = server.substr(0, server.search(“api/v1”));
let url = server + this.photo_url;
let ticket = this.model.connector.connection.session.ticket;
fetch(url, {method: ‘GET’, headers: {“OTCSTicket”: ticket}})
.then(response => response.blob())
.then(function (myBlob) {
const URL = window.URL || window.webkitURL;
let photo = URL.createObjectURL(myBlob);
let img = document.createElement(“img”);
img.classList.add(“photo”);
img.src = photo;
document.querySelector(“#photo”).appendChild(img);
setTimeout(() => {
URL.revokeObjectURL(photo);
}, 100); // cleanup
}
);
},

       
Using the Node-Picker

The Node-Picker is a standard function in the sdk, so we use it.

The file ‘csui/dialogs/node.picker/node.picker’, // the csui node picker

is required at the beginning of the docpad.view.js file under the local name of NodePicker like this:

define([
‘csui/lib/underscore’, // Cross-browser utility belt
‘csui/lib/marionette’, // Marionetter
‘csui/lib/moment’, // the date/time lib in csui
‘ademo/widgets/docpad/impl/docpad.model.factory’, // Factory for the data model
‘csui/dialogs/node.picker/node.picker’, // the csui node picker
‘i18n!ademo/widgets/docpad/impl/nls/lang’, // Use localizable texts
‘hbs!ademo/widgets/docpad/impl/docpad’, // Template to render the HTML
‘css!ademo/widgets/docpad/impl/base’, // base Stylesheet needed for this view
‘css!ademo/widgets/docpad/impl/adv’, // adv stylesheet for this app
‘css!ademo/widgets/docpad/impl/print’ // print style sheet
], function ( _, Marionette, Moment, DocpadModelFactory, NodePicker, lang, template)

The picker is started by pressing the big button, this actually calls showThePicker. Types of 144 (only Documents) and on the enterprise volume are among the start conditions.

The title and other strings are extacted from the appropriate lang.js language file(s). The command nodePicker.show() at the beginn of the chain displays the nodePicker

showThePicker: function () {
if (undefined === this) {
return;
}
let btn = document.querySelector(“.btn”);
btn.classList.remove(“animate-large”);
btn.classList.add(“animate-large-backward”);

let nodePicker = new NodePicker({
connector: this.model.connector,
selectableTypes: [144],
dialogTitle: lang.pickerTitle,
selectButtonLabel: lang.selectPickerButtonLabel,
startLocation: ‘enterprise.volume’
});
nodePicker.s how()
.fail(function () {
console.error(“Picker fails to show”);
})
.done(_.bind(function (args) {
document.querySelector(“.printbtn”).classList.remove(“hide”);
document.querySelector(“#content”).classList
.replace(“hide”, “display”);
let node = args.nodes[0];
let id = node.attributes.id;
this.docname = node.attributes.name;
document.querySelector(“#document1”).innerHTML = this.docname;
this.loadDocumentThumbnail(id); // load the thumbnail and display it
this.loadDescriptions(node); // load node data and display it
this.loadNodeData(id); // inquire form update data and display it
}, this)
);

},

The callback in the .done clause simply extracts the node id and the name from the results and stores them in the docpad “this”.

In the next week part “Rethinking smartUI Part 4-B” we’ll discuss the REST Commad how to get all document data (nearly) at once.

Rethinking smartUI Part 3

The upper PART

Two weeks ago I published a new video on Rethinking smartUI on Youtube. If you havn’t it seen yet, here is the short video. In this posts, we’ll go though the technical aspects of this new kind of approach to smartUI. This demo is a Document Pad (or short DocPad), displaying all document properties in a SCIFI GUI arrangement.

This is Part 3 of a multipart post covering the technical start of DocPad and the associated CSS

The post parts until now

Part 4-A The Javascript Part 1

Part 3 The Infrastructure and the CSS

Part 2 The HTML

Part 1 Overview

A. The technical infrastructure

Die DocPad is started via the index.html, found in the test folder of the docpad. This is because of its much easier to test modifications via index.html compared to installing the finished widget in the content server and trying it then.

The first part of index.html

The first part of index html

The whole page is more or less standard. We habe 4 areas of interest in the first part. Dont forget, nuc was introduced in 21.2. In older versions nuc was not existant.

  • A. Here, you see the config is moved to the nuc area
  • B. Change A will also require to add the nuc-extensions to the list of dependancies for csui. Thed whole js code is loaded by the “ademo” extension.
  • C. My machine is on a VM. This means, if you put this on your machine, you’ll have to enter your url and your support directory (called supportPath) in the configuration.
  • D. I’ve used my ticket to provide the login. You can use username and password to login the more convential way or also provide a ticket of youfself here. The displayed ticket won’t work at your site.

The second part of index.html

Part 2 of index html

Here we have this points of interest

  • A. The require parts of needed modules. It is a docpad.mock.js in the list of modules, but its not recommended to use the mockup facility because there is lot of traffic from and to the server. Its much easier to add a real content server behind that index.html.
  • B. Standard procedure. Instanciate the widget with the context, display it in the region and fetch the context from the server. As mentioned, its easiere to use a real server as data source so switch the mockup off.
  • C. The marionette region is associated with the id=”widget” of this div, so the widget is displayed here.

At the end there is immediate function starting with

(function() {
var ws = new WebSocket('ws://' + window.location.host + '/jb-server-page?reloadServiceClientId=1');

This is inserted by Webstorm (my JS IDE) to allow auto-reload of changed files. You can ignore that, other IDEs will solve that differently.

B. The CSS

The CSS

Here, all css files used in the docpad are listed in the require.js list of required modules. For demo purposed and to keep things simple, I used 3 css files:

base.css

This file contains the basic css instructions to style the .hbd template shown at the top of list.

adv.css

contains the more advanced css instructions for docpad

print.css

is the print style sheet to provide a proper output on paper.

Disclaimer: The css instructions selected do not impose or propose any styles to be followed, instead they are used for pure demo purposes here.

The base.css File

Here the main functionality of the css is found. One of the feature of css3, the variables, is used. In the pseudo element :root all variables used in different rules are defined.

css setup

At the end of the snipped you’ll see a media query, saying the following is used tor screen display.

Lets discuss several stages:

The header + the footer

Header
Footer

These two structures exploit the standard userinfo with REST from the content server in a standard backbone model. “Hello xxxxxx” is a structure from the hello world demo with the word “hello” and the signed in user. The photo is downloaded from the content server and dosplayed.

The bottom structure simply prints out other user data on the screen .

Points of interest:

Font used

The nice digital effect is build with the font “digital-7”

The header is done with

The header

Ther photo is inserted at the position of the .photo class in the size of 80×60 pixel. Then the photo gets this nioce rounded look using border-radius.

the footer conststs of three divs, which are positioned in 3 columns

The footer

The divs have the classes .boxcell and .leftbox and are descendants of the .footerbox

The document display

The document display consists of six divs and are positioned using a css grid.

docdisplay

They have numbers, the numbers range from one (“card1”) to 6 (“card6”). The css part is (shown here only a fraction):

Cards

They are all children of the cube, using the display mode grid and displaying the 6 cards in two rows and threee columns.

cube

Using the diffent REST methods in the js code, these cards were filled by the standard methods of DOM interaction from js. This will be the main theme of next weeks post, the js code of “Rethinking smartUI” Part 3.

Rethinking smartUI Part 2

Last week I published a new video on Rethinking smartUI on Youtube. If you havnt seen yet, here is the short video. In this posts, we’ll go though the technical aspects of this new kind of approach to smartUI. This demo is a Document Pad (or short DocPad), displaying all document properties in a SCIFI GUI arrangement.

This is Part 2 of a multipart post covering all aspects of this Docpad.

The post parts until now

Part 4-A The Javascript Part 1

Part 3 The Infrastructure and the CSS

Part 2 The HTML

Part 1 Overview

The html

First, lets take a look on the html (in out case the .hbs file). This is the top part of the widget,

The HBS file, Part 1

The div with the class “topbox” contains the Username and the Photo. There is also a Print-Button, if a document is selected.

A button, class “picker” is sorted in a div with “picker”. This is the button used to start the node picker component. Its hidden now, will be used later, when a document is selected.

The documents data is displayed using a cube. A cube has six sides so all sides can be presented using a css grid or …. (arranged in a 3D cube which can be rotated in 3D. But thhis is not part of this demo).

The whole thing is arranged in a div with the id of “content”. The top element is a div with the class “cube”, holding six divs for the six sides

The html of the cude

Just for fun, the cube is surrounded by a div with the class “cam”. If you want to display the cube as 3d cube. This is used to set the perspective on a 3d Cube. If you are interested in learning more on 3D css cubes, refer to https://css-tricks.com/simplifying-css-cubes-custom-properties/

Lets see the Actions.

The Actions in Detail

After starting the aplication, there are several stages, They are

a. Application started

Docpad started
Docpad started

Here, the application is just started. the “Select a document” started from the top left and moved in the center of the window. This button will initiate the nodepicker, a standard component of the smartUI library.

Until now, we queried the user logged in. Then, we greet the user, extract the photo from the Content Server Database and display this photo in the top right. To make it more interesting, we displayed the photo in round form via css.

In the three panels at the bottom we displayed the rest of the user record. Just for fun we surrounded the three botton panels with a border animation.

b. Nodepicker selected

The next stage is the nodepicker. After pressing the “Select a document” button, it is started and delivers a list folders and documents on the associated Content Server. Meanwhile, the Button moved again back to upper left corner.

c. Document selected

After the user selects a document, all data of this document is read from the server. In this case, the “APAC Expense Report.xlsx” is selected. (Refer to part 1 to see which REST commmands are used).

As of do have a document, we can display the header and the print button.

As you can see when playing the video, the cube starts in some distance and is a little but foggy during the approach and clears when we receive all data.

Display of the document

The reason is, that the forms/update REST call takes longer and we use CSS animations ans css view filters to hide that fact from the user. When in doubt, view the video on top of this post again.

For beauty there are also some hovering effects added, on the panels and on the thumbnail.

d.Document to Print

The last stage is (besides selecting a new document to display) is to print the document.

Here we have the print previev in Chrome

Print stylesheet in action

As we can see. the node data is printed in the form of a css grid on paper. The display is in standard layout. A media query selects a print style sheet instead of the normal stylesheet(s).

In the next week we discuss in Part 3 the index.html to start the widget standalone and css stylesheets (there are really several). We’ll then conclude this series in Part 4 discussing the Javascript made for this demo.

We’ll use the 4 steps of the application as the schema for Part 3 and 4.

See you next week!

Rethinking smartUI

Rethinking smartUI

Yesterday I published a new video on Rethinking smartUI on Youtube. If you havnt seen yet, here is the short video. In this posts, we’ll go though the technical aspects of this new kind of approach to smartUI. This demo is a Document Pad (or short DocPad), displaying all document properties in a SCIFI GUI arrangement.

This is Part 1 of a multipart post covering all aspects of this Docpad.

The post parts until now

Part 4-A The Javascript Part 1

Part 3 The Infrastructure and the CSS

Part 2 The HTML

Part 1 Overview

Part 1: Overview

This demo is supposed to do this:

  • Demonstrate the usage of html5, javascript 6 and css3. Especially how to use this modern components inside the smartUI framework. For example, the famous parse function in the standard demo widget will change this way:

js5
parse: function (response) {
// All attributes are placed below the data key
return response.data;
}

js6
parse: (response) => response.data

  • Compare the methods of getting data via REST using the methods
    • Model (as usual in Backbone)
    • XMLHttpRequest (the good old XMLHttpRequest as used since long)
    • Fetch (the modern javascript 6 method to get REST data)
    • Not used: async/await. Despite the fact that async/await is very handy, this appeared in javascript 8 and is beyond the scope of the demo.
  • Exploit the power of css by using
    • different display methods as Grid and Table
    • use panels in a cube. Origionally this was planned as moveable 3D cube. In this case, all metadata were written on the sides of the cube. Here is a very good technical description how to do it. Unfortunately the cube idea seems to be very impractical for the demo and was dropped
    • “burning borders”, fog and animations as element of the UI
    • provide an additional print style sheet to make this screen infos printeable in a normal way
  • Except some Icons of Bootstrap, nothing is to be used from Bootstrap. These Icons can be replaced by those in the fontawesome font
  • JQuery is not used
  • These Content Server REST commands are used
    • The log in is done in the index.html page, but when used as a dashboard widget the user is already logged in. If a separate login panel is required (p.ex. for a stand aloane dashboard on a tablet) then its necessary to provide a login via post […]/v1/auth
    • then the user info is retrieved via get […]/v1/auth
    • if the user has stored a photo in his profile, then the photo is retrieved by get […][photo-url] and then inserted in the DOM.
    • Then the standard Node-Picker Control is called. All Node-Picker interactions with the server are managed by this control. As a result, the nodeid of the selected Document is returned.
    • Next, the DAPINode infos of the selected node are retrieved by get […]/v1/nodes/{id}
    • for the “created_by” and “modified_by” there is only a nujmber returned, the userid. For this two cases, the username is retrieved from the server by get […]/v1/members
    • Its always nice to exploit the thumbnail of the document to the user. This is done by get […]/nodes/{id}/thumbnails/medium/content
    • To get all other data , a special trick is used. The command get […]/v1/forms/nodes/update returns all categories and the othere nodeinfos using one REST call. This commmand takes a while to execute, so the user is entertained using some css animations.

In the next week, we explore the infrastructure of this Docpad demo.