Content Server example project with Natural Language Processing NLP

Navigation

this is an Content Server example project with an autocategorizer. It follows these guidelines.

Definition of categories. Which attributes and categories should become components?
Avoiding prejudice. Do the new categories imply prejudice and do these have any impact?
What is good evaluation of a test run?
Choosing the Categorizer, Neural Network or Simple Algorithm? For simple algorithms, several should be selected and tested.
Pre-trained as open source or the categorizer still needs to be trained?
Selection of the training data set and the test data set. Although “the more the better”, about 10% of the data set for testing and training is enough to get started. It is important to consider the point of prejudice.
Data transfer.
Possibly pre-process data by using natural language processing (NLP) tools
Training and testing each selected algorithm. Assessing accuracy through framework evaluation tools.
Selection of the algorithm or neural network with the most favorable ratings from the test runs
Production run: New business workspace or document is transferred to Python. The categorizer categorizes the document/business workspace and enters the new categories/attributes in the content server for the documents/business workspaces.
Logs are generated as needed
A trained categorizer can be saved, retrained and used over and over again.