Tuesday, March 17, 2009

Data mining

Data mining is the process of extracting hidden patterns from data. As more data is gathered, with the amount of data doubling every three years,data mining is becoming an increasingly important tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery. Data mining can be applied to data sets of any size. However, while it can be used to uncover hidden patterns in data that have been collected, obviously it can neither uncover patterns which are not already present in the data, nor can it uncover patterns in data that have not been collected.

Data mining commonly involves four classes of task:

* Classification - Arranges the data into predefined groups. For example an email program might attempt to classify an email as legitimate or spam. Common algorithms include Nearest neighbor, Naive Bayes classifier and Neural network.
* Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together.
* Regression - Attempts to find a function which models the data with the least error. A common method is to use Genetic Programming.

No comments: