Data Mining

Published by James Taylor

Text mining engine Algorithm finds the frequent patterns by calculating current frequent patterns as well as previous frequent patterns. This is an essential aspect of data mining since the combination of the database may form a huge number of item sets. With this in mind, it is wise noting that mining frequent itemsets in large transaction database may be a challenging exercise. It is on this premise that the apriori algorithm among other data mining processes becomes necessary. This implies that mining of frequent itemsets can be realized by first scanning the database thus being able to find the frequent 1 -itemsets. This will form the basis upon which the generation of the 2-itemset candidate will be made possible. Having identified frequent 2-itemset and checking it against the database, it is then simple to continue the mining process. The Apriori algorithm is based on identifying the other frequent itemset until there is no further generation.

It is, however, worth noting that this approach has been criticized as an involving practice thus necessitating some improvements. The process has also been termed as an expensive process thus making it of paramount importance to design better text mining engine. This resulted in a data mining process where a complete set of frequent items is mined without generating candidates. FP-growth method was designed to ensure that data mining would be easier and less expensive. It is a process that involves divide and conquer technique where the first scan of the database derives a list of frequent items where frequency is in descending order. The [process is made possible by use of an FP-tree that retains the itemset association information. FP-growth algorithm shortens the long, frequent patterns thus substantially reducing the search time and costs.

Text Post processing by calculating the percentage of the support values for the current news and the news which come after some time. Use the percentage coming from the previous module and conclude the result if the news is emerging or not. Text post processing must ensure that the user can interpret the extracted knowledge easily. In this case, it will be essential to understand whether or not there is emerging news. This will be made possible through knowledge filtering process that is carried out using decision tree and decision rules. The objective of postprocessing is extracting meaningful results essential in decision making. Deriving the percentage of the support values of the news in question may be challenging due to the varying degree of redundancy thus making it difficult to make conclusions. Nevertheless, it is advisable to use rule truncation in post processing thus improving the performance of the classifier. In cases of the decision tree, decision rules may be adopted to shrink the tree for better understanding. Failure to do this could be detrimental especially if there is the unusual growth of a tree thus difficult news interpretation.

Besides knowledge filtration, it is also critical ensuring evaluation of the performance of the system. This will be a major step towards the determination of whether or not the news is emerging. It is at this stage that the classifier model is evaluated in term of classification accuracy, computational complexity and margin f error among others. This process can be made easier by use of the Confusion Matrix that makes it easy to evaluate the performance and classification of the clustering process. This tool will be essential in highlighting the details of emerging and existing news. Data visualization will also be an integral component of text post processing as it will also be critical in deciding the presence of emerging news or not.

Do you need an Original High Quality Academic Custom Essay?