Tuesday, June 11, 2019

Data Mining - Questions to answer Essay Example | Topics and Well Written Essays - 1000 words

Data Mining - Questions to answer - Essay ExampleBack-Propagated Delta rein in Net full treatment (BP) is an example for multiple perceptron which contains additional hidden layers. It can procedure effectively compared to the single layer.In the prospicience process of neural networks to make accurate prediction the training cases are increased which eventually leads to overfitting (George N. Karystinos, 2000). This occurs when the number of input variables is large compared to the training cases or when the input layers are highly correlated with each other. In methods like kernel regression and smoothing splines, the under fitting and overfitting of neural networks is usually encountered. The overfitting occurs in more complex networks. This leads to unprecedented predictions or wild predictions.Data neaten is the process of removing inaccurate and inappropriate data records, which is an integral process of data processing and maintenance. In large data sets, the process of fi nding error and correcting the selfsame(prenominal) needs interaction with the domain experts which is an expensive and time consuming process. Since it involves a comprehensive assignment of identifying and rectifying errors and hence the task is complex. Initially these operations are carried bulge out manually and later computational means of data cleansing evolved and even this process are time consuming and error prone (Heiko Mller et al ).3. What is the conditional relation of Bayes Theorem in Data Mining Give an example of how statistical inference can be used for Data Mining.Most of the presently getable statistical models in data mining are prone to overfitting and also unstable (sensitive to minor changes in the data). These difficulties can be overcome in the Bayesian methods of statistical mining. The reliability of these algorithms has been reviewed (J. Kolter and M. Maloof, 2003). The Bayesian algorithm facilitates integration of clustering and produces scalable po werful algorithm apt for data mining. Capturing correlation of large number of variables is practical using the Bayesian method. Example In the search process of similar sequences (gene or protein sequences) in a sequence database, the data mining algorithm works by searching for similar matches which is based on the statistical preferences (e- value). Lower the expected value higher the relationship between the query and the retrieved results. Since the data multiform is a mere combination of string only statistical measures ensures comparative account of the data sets. 4. Explain the concept of a Maximum Likelihood calculator with an example.This is practically applied in prediction of phylogenetic relationships of protein sequences by tree algorithms. The maximum likelihood information processing system forms the basis for the evolutionary prediction algorithms. The likelihood function predicts the relative function of all the given datasets (protein sequences). The algorithm eventually finds the most likely relative to the other sequences in the datasets by maximum likelihood estimator and hence it is easy to predict the ancestral route as well as how

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.