Semantic analysis
Tuesday, August 25th, 2009I understand it is mean to talk about features that are not implemented in the downloadable versions, but I would like to share my ideas on the purpose behind our experimental semantic features.
The “mechanical” recognition and extraction algorithms used in most views of the Hub are mostly based on a combination of DOM analysis (when dealing with HTML pages) and morphological recognition of objects and strings. These techniques are very efficient for simple scraping of data, but they are not sufficient when we need to discriminately extract data about certain themes or topics. We are currently adding semantic capacities to our extractors (in professional applications only, for now).
At the moment, we are only focusing on statistical analysis of the words and phrases, without performing any syntactic analysis of the texts. However, the results are very promising and seem to confirm our original ideas.