PREDICTIVE MODELS OF EMPLOYEE VOLUNTARY TURNOVER IN A INDIAN PROFESSIONAL SALES FORCE USING DATA-MINING ANALYSIS
|
|
|
- Donald Haynes
- 9 years ago
- Views:
Transcription
1 International Journal of Data Engineering (IJDE) Singaporean Journal of Scientific Research(SJSR) Vol.6.No Pp available at: Paper Received : Paper Accepted: Paper Reviewed by: 1. Dr. P. Surya Prakash 2. Basu Reddy Editor : Dr. Binod Kumar PREDICTIVE MODELS OF EMPLOYEE VOLUNTARY TURNOVER IN A INDIAN PROFESSIONAL SALES FORCE USING DATA-MINING ANALYSIS K. Tamizharasi Research Scholar Dept. of Computer Science Periyar University,Salem. Dr. UmaRani Associate Professor Dept. of Computer Science Sri Saradha College for Women, Salem K.Rajasekaran Asst. Professor, DB Jain College Thoraipakkam, Chennai ABSTRACT Data mining concept is growing fast in popularity, it is a technology that involving methods at the intersection of (Artificial intelligent, Machine learning, Statistics and database system), the main goal of data mining process is to extract information from a large data into form which could be understandable for further use. Some algorithms of data mining are used to give solutions to classification problems in database. In this paper a comparison among three classification s algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) algorithms. The paper will demonstrate the strength and accuracy of each algorithm for classification in term of performance efficiency and time complexity required. For model validation purpose, twenty-four-month data analysis is conducted on a mock-up basis. Keywords: Decision tree, Bayesian network, k- nearest neighbour classifier.
2 1. INTRODUCTION Data Mining (The analysis step of the knowledge discovery in data base) a powerful new technology improved and so fast grown. It is a technology used with great potential to help business and companies focus on the most important information of the data that they have to collect to find out their customer's behaviors. Intelligent methods are applied in order to extracting data pattern, by many stages like" data selection, cleaning, data integration, transformation and pattern extraction". Many methods are used for extraction data like" Classification, Regression, Clustering, Rule generation, Discovering, association Rule etc. each has its own and different algorithms to attempt to fit a model to the data. Algorithm is a set of rules that must be followed when solving a specific problem (it is a finite sequence of computational steps that transform the given input to an output for a given problem). The problem can be a machine. Classification techniques in data mining are capable of processing a large amount of data. It can predict categorical class labels and classifies data based on training set and class labels and hence can be used for classifying newly available data. Thus it can be outlined as an inevitable part of data mining and is gaining more popularity (RAJ et al. 2012).In this paper Classification Method is considered, it focuses on a survey on various classification techniques that are most commonly used in data-mining. The study is a comparison between three algorithms (Bayesian network, K-NN classifier and Decision tree) to show the strength and accuracy of each algorithm for classification in term of performance efficiency and time complexity. 2. ALGORITHM The algorithm is a computational procedure which takes some value or set of value as input and generates some value or set as output. The result of a given problem is the output that we got after solving the problem. The algorithm is considered to be correct, if for every input instance, it generate the correct output and it gets terminated and give the desired output otherwise it does not considered as a correct algorithm. 3. ANALYSIS OF ALGORITHM A situation may occur where many algorithms are available for solving a particular problem. The data structure can be represented in many ways and many algorithms are there to implement an operation on these data structure. Here we require to comparison of two algorithms to implement an operation on these data structure and the better one is chosen. The analysis of algorithm is focus on time complexity and space complexity, as compared to time analysis the space analysis requirement for an algorithm is easer, but wherever necessary both of them are used. The space refers to storage required in addition to the space required to store the input data. The amount of memory needed by the program to run to completion referred to as Space complexity. The amount of time needed by the program to run to completion referred to as Time complexity, it is depending on the size of the input. It is a function of size: (n) [T (n)]. Best Case: It is the function defined by the maximum number of steps taken on any instance of size (n). Average Case: It is the function defined by the Average number of steps taken on any instance of size (n). Worst Case: It is the function defined by the minimum number of steps taken on any instance of size (n). 4. K-NEAREST NEIGHBOUR ALGORITHM 4.1 General view on KNN Algorithm One of the simplest non parametric lazy algorithms called as "Closest Point Search" is a mechanism that is used to identify the unknown
3 data point based on the nearest neighbor whose value is already known. easy to understand but has an incredible work in fields and practice specially in classification (it can be used in regression as well), non-parametric mean does not make assumptions on the data and that is great and useful in the real life, and lazy mean does not use training data to do generalization, that and in best case it makes decision based on the entire training data set. Figure 1 illustrates the modeling. For a data record t to be classified, its k nearest neighbors are retrieved, and this forms a neighborhood of t. Majority voting among the data records in the neighborhood is usually used to decide the classification for t with or without consideration of distance-based weighting. However, to apply KNN algorithm we need to choose an appropriate value for k, and the success of classification is very much dependent on this value. In a sense, the KNN method is biased by k. There are many ways of choosing the K value, but a simple one is to run the algorithm many times with different k values and choose the one with the best performance (GUO et al. 2003) There are three key elements: a set of labeled objects (e.g., a set of stored records) A distance or similarity metric to compute distance between objects. The value of k, the number of nearest neighbors. (WU, KUMAR et al. 2008) Advantages of KNN Algorithm: KNN is an easy to understand and easy to implement classification technique. It can perform well in many situations. Cover and Hart show that the error of the nearest neighbor rule is bounded above by twice the Bayes error under certain reasonable assumptions. Also, the error of the general KNN method asymptotically approaches that of the Bayes error and can be used to approximate it. KNN is particularly well suited for multimodal classes as well as applications in which an object can have many class labels. Disadvantages of KNN Algorithm: The naive version of the algorithm is easy to implement by computing the distances from the test sample to all stored vectors, but it is computationally intensive, especially when the size of the training set grows. 4.2 Previously researches on KNN algorithm A group of researchers in University of Ulster and Queen's University Belfast in them research find the classification accuracy on six public datasets is comparable with C5.0 and KNN and a novel KNN and they named it KNN model which has a few representatives from training dataset with some extra information to represent the whole training dataset the selection of each representative they used the optimal but different k decided by dataset itself the classification accuracy of KNN Model was higher than KNN and C5.0. the KNN Model significantly reduces the number of the data tuples in the final model for classification with a 90.41% reduction rate on average. It could be a good replacement for KNN in many applications such as dynamic web mining for a large repository (GUO et al. 2003). In (RAIKWAL, J. & SAXENA, K. 2012) did a research over a medical data set they made a comparison between KNN and SVM them result was after implementing the two algorithm showed that K-NN is a quit good classifier but when applying KNN algorithm over small data set and it is accuracy decrease when it applies over large data set it performs poor results (it s all performance parameters are varies according to the size of dataset)..svm is complex classifier and the accuracy and other performance parameters are not too much depends over dataset size but about all factors dependent over the no of training cycles.the search time of SVM remains constant doesn't depend on the size of
4 data set while search time in KNN increasing when the size of data increase(raikwal, J. & SAXENA, K. 2012). (KAREGOWDA, A. G., JAYARAM, M. & MANJUNATH, A. 2012) made a paper using cascading k-means clustering and KNN classifier over diabetic patient them result was quite good. The model consists of three stages. The first stage, K-means clustering which is one of the simplest unsupervised learning algorithms and follows partitioning method for clustering. In the second stage Genetic algorithm (GA) and Correlation based feature selection (CFS) is used in a cascaded fashion in the third stage a fine tuned classification is done using K- nearest neighbor (KNN) by taking the correctly clustered instance of first stage and with feature subset identified in the second stage as inputs for the KNN. These enhanced classification accuracy of KNN. The proposed model obtained the classification accuracy of 96.68% for diabetic dataset (KAREGOWDA et al. 2012). Graz University of Technology, University of Washington, in May 2004 Experimented on the data of a surface inspection task and data sets from the UCI repository. Bayesian network classifiers more often achieve a better classification rate on different data sets as selective k-nn classifiers (PERNKOPF, F. 2005). (VENKATESWARLU et al. 2011) made a study on Classification Algorithms for Liver Disease Diagnosis them results showed that the sensitivity of C4.5 classification algorithm and accuracy was less than KNN classifier accuracy and sensitivity (RAMANA & VENKATESWARLU et al. 2011) 5. DECISION TREE ALGORITHM 5.1 General idea of algorithm Tree structure which has been widely used to represent classification models (a classifier depicted in a flowchart) (BARROS et al. 2012). Decision tree induction algorithms, an inductive learning task use particular facts to make more generalized conclusions. Most decision tree induction algorithms are based on a greedy topdown recursive partitioning strategy for tree growth. They use different variants of impurity measures, like; information gain (BARROS et al. 2012), gain ratio (WANG et al. 2005), and distance-based measures (DE MÁNTARAS & R. L. 1991), to select an input attribute to be associated with an internal node. One major drawback of Greedy search is that it usually leads to suboptimal solutions. A predictive model based on a branching series of Boolean tests, these smaller Boolean tests are less complex than a one-stage classifier. The general form of this modeling approach is illustrated in Figure 2. Entropy of decision tree is the information gain measure, is minimized when all values of the target attribute are the same, If we know that commute time will always be short, then entropy = 0. Entropy is maximized when there is an equal chance of all values for the target attribute (the result is random), If commute time = short in 3 instances, medium in 3 instances and long in 3 instances, entropy is maximized. Calculation of entropy: S = set of examples Si = subset of S with value vi under the target attribute l = size of the range of the target attribute. Decision Trees offer many benefits in data mining technology like: Self-explanatory and easy to follow when compacted The ability of handling variety of input data: nominal, numeric and textual Ability of processing data sets that may have errors or missing values High predictive performance for a relatively small computational effort Useful for various tasks, such as classification, regression, clustering and feature selection Decision tree classifier is able to break down a complex decision making process into
5 collection of simpler and easy decision. The complex decision is subdivided into simpler decision. It divides whole training set into smaller subsets. Information gain, gain ratio, gain index are three basic splitting criteria to select attribute as a splitting point. Decision trees can be built from historical data they are often used for explanatory analysis as well as a form of supervision learning. The algorithm is design in such a way that it works on all the data that is available and as perfect as possible (PAWAR, T. & KAMALAPUR. 2011). There are many specific decisiontree algorithms: 1. ID3 ( Iterative Dichotomiser 3) 2. C4.5 Algorithm, Successor of ID3 3. CART (Classification And Regression Tree) 4. MARS: extend decision tree to better handle numerical data. Advantage Of Decision Tree:Advantages over other learning algorithms, such as robustness to noise, low computational cost for generating the model, and ability to deal with redundant attributes, Besides, the induced model usually presents a good generalization ability (HAN et al. 2006),( PANG-NING et al. 2006) Problems with Decision Tree: While decision trees classify quickly, the time for building a tree may be higher than another type of classifier Decision trees suffer from a problem of errors propagating throughout a tree a very serious problem as the number of classes increases. 5.2 Different Between Nearest Neighbor Classifiers and Decision Tree Algorithm Nearest neighbor classifiers are instance-based or lazy learners in that they store all of the training samples and do not build a classifier until a new (unlabeled) sample needs to be classified. This contrasts with eager learning methods, such a decision tree induction and back propagation, which construct a generalization model before receiving new samples to classify. Lazy learners can incur expensive computational costs when the number of potential neighbors (i.e., stored training samples) with which to compare a given unlabeled sample is great. Therefore, they require efficient indexing techniques. An expected lazy learning method is faster data training than eager methods, but slower at classification since all computation is delayed to that time. Unlike decision tree induction and back propagation, nearest neighbor classifiers assign equal weight to each attribute. This may cause confusion when there are many irrelevant attributes in the data. 5.3 Previously researches of Decision Tree (Fahad Shahbaz Khan et al. 2008), they did an experiment to examine ID3 and C4.5 in oral medicine after the experiment they selected the C4.5 decision tree algorithm because the algorithm has the ability for handling data with missing attribute values better than ID3 decision tree algorithm. It also avoids overfitting the data and reduces error pruning (KHAN et al. 2008) Author Patrick Ozer made a comparison among four classification decision tree algorithm J84, REPTree, RandomTree and LMT the J48 algorithm is WEKA s implementation of the C4.5 decision tree learner. The algorithm uses a greedy technique to induce decision trees for 20 classification and uses reduced-error pruning REPTree is a fast decision tree learner. RandomTree is an algorithm for constructing a tree that considers K random features at each node. Performs no pruning.lmt is a combination of induction trees and logistic regression. LMT uses cost-complexity pruning The LMT algorithm seems to perform better on data sets with many numerical attributes. The results showed that LMT has got the best overall performance, the performance of the J48 and RepTree algorithm were almost the same on all the data sets, RandomTree algorithm we see that it builds the largest trees (and has the lowest overall performance). For the LMT algorithm restricted the experiments done on five runs, because of the
6 time it costs to run that algorithm. For the other three algorithms one run took less than a minute, but for the LMT algorithm one run could take several hours. This algorithm is significantly slower than the other algorithms, for the four classification tree algorithms they used, that using cost-complexity pruning has a better performance than reduced-error pruning (OZER & P. 2008). (Rahul et al. 2012) modified ID3 decision tree algorithm and named it improved ID3, After testing the original ID3 algorithm and proposed improved ID3 algorithm on dataset. In improved ID3 algorithm they got more number of nodes and more number of rules which means that improved ID3 algorithm is more efficient than original ID3 algorithm. And it is differs from original ID3 in following way. It is using extra Association Function to overcome the short comings of Id3 Improved Id3 more reasonable and effective rules are generated Missing values can be considered and will not have impact on accuracy of decision. Accurate rules The accuracy of decision is more because e no of rules is more The only Disadvantages were Time complexity is more in improved ID3, but it can be neglected because now day s faster and faster computers are available (Rahul et al. 2012). October 2012 College Of Engineering,Pune, India present a paper which compare between ID3 decision tree and FID3(Fuzzy iterative dichotomizer3) which is same to ID3 but add fuzzification for improving the result of ID3, the data represent in ID3 is crisp while for FID3 they are fuzzy, with continuous attributes the main difference between ID3 is that ID3 works well with discrete values but for continues value FID3 showed better accuracy.classical decision tree has two issues like How to split the training instances and what the stopping rule is to terminate the splitting procedure. These two problems are solved with the help of fuzzy decision tree. (SURYAWANSHI, R. D. & THAKORE, D. 2012) a group of researchers in France did a comparison between Bayesian network algorithm and CART decision tree for predicting access to the renal transplant waiting list them results showed the models were complementary sensitivity, specificity and positive predictive of the two models was same result and both model had high accuracy since the Bayesian network provided a global view of the variables associations while the decision tree was more easily interpretable by physicians (BAYAT et al.). 6. BAYESIAN NETWORK ALGORITHM 6.1 General view of the algorithm Bayesian network (BN) is also called belief networks, is a graphical model for probability relationships among a set of variables features, This BN consist of two components. First component is mainly a directed acyclic graph (DAG) in which the nodes in the graph are called the random variables and the edges between the nodes or random variables represents the probabilistic dependencies among the corresponding random variables. Second component is a set of parameters that describe the conditional probability of each variable given its parents. A Bayesian network (BN) describes a system by specifying relationships of conditional dependence between its variables.the conditional dependences are represented by a directed acyclic graph, in which, each node (BAYAT et al. 2009).The general form of this modeling approach is illustrated in Figure Previously Researches of The Algorithm March 2011 group of researchers did a comparison among many classification algorithm in data mining over Heart Disease Prediction, in them comparison the result showed that the accuracy of Navi Bayesian and decision tree is so close to each other we can say that both of them have same accuracy. while the accuracy of the k-nn algorithm can be severely degraded by the presence of noisy
7 or other features, and the time taken by each algorithm showed that the Bayesian algorithm was the faster one,but by incorporated genetic algorithm with decision tree algorithm in this way Decision tree algorithm will be outperform on KNN and Bayesian algorithm in the manner of accuracy(soni et al. 2011). In many data mining classification model the decision tree and Bayesian algorithm had similar high predictive performance Bayesian networks can link more variable in complex direct and indirect ways making interpretation more complex while decision trees and provide a simpler and more direct interpretation (BAYAT et al. 2009). International Journal on Computer Science and Engineering (IJCSE) J.Sreemathy Research Scholar Karpagam University and P. S. Balamurugan Research Scholar ANNA UNIVERSITY published a paper which was about a comparison between KNN and Bayesian classification algorithm on an efficient text classification the comparison showed the Precision of Bayesian algorithm over KNN and SVM algorithm (SREEMATHY et al. 2012).the success and outperform of Bayesian algorithm over KNN appear in another set of data base which is Tuberculosis. Classify the patient affected by tuberculosis into two categories (least probable and most probable) (Hardik Maniya et al. 2011). Comparison four different data miningbased techniques including artificial neural networks (ANNs), support vector machines (SVMs), decision trees (DTs) and Bayesian networks (BNs) were investigated. Results of validation stage showed ANN had the highest classification accuracy, 96.33%. After ANN, SVM with polynomial kernel function (95.67%), DT with J48 algorithm (94.67%) and BN with simulated annealing learning (94.33%) had higher accuracy, respectively (MOLLAZADE et al. 2012). 7. CONCLUSION Due to our survey on comparison among data mining classification's algorithms (Decision tree, KNN, Bayesian) and analyzing of the time complexity of the mentioned algorithms we conclude that all decision Tree's algorithms have less error rate and it is the easier algorithm as compared to KNN and Bayesian. The knowledge in Decision Tree represented in form of [IF-THEN] rules which is easier for humans understand. The disadvantages of decision tree algorithm are typically require certain knowledge statistical experience to complete the process accurately It can also be difficult to include variables on the decision tree, exclude duplicate information. As we mentioned there are many specific decision-tree algorithms. CART decision tree algorithm is the best algorithm for classification of data, Which have shortest execution time. The result to predictive data mining technique on the same dataset showed that Decision Tree outperforms and Bayesian classification having the same accuracy as of decision tree but other predictive methods like KNN, Neural Networks, Classification based on clustering are not giving good results. up to here and due to our survey based on the previously researches we extract the fact that among (Decision tree, KNN, Bayesian) algorithms in data mining, KNN is having lesser accuracy while Decision tree and Bayesian are equal. But if Decision tree algorithm has merged with genetic algorithm then in this way the accuracy of the Decision tree algorithm will improve and become more powerful and it will arise to be the best model approach among the other two algorithms. The efficiency of results using KNN can be improved by increasing the number of data sets and for Bayesian algorithm classifier by increasing the attributes. For time issue, researches statistics we conclude that the faster algorithm for classifier respectively is: Navi- Bayes algorithm, Decision tree and finally KNN algorithm that mean the last one is the most slowly algorithm for classifier. References [1] RAJ, M. A Mrs. Bincy G, Mrs. T. Mathu. Survey on common data mining classification Technique.International Journal of Wisdom Based Computing, 2.
8 [2] GUO, G., WANG, H., BELL, D., BI, Y. & GREER, K KNN model-based approach in classification. On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. Springer. [3] WU, X., KUMAR, V., QUINLAN, J. R., GHOSH, J., YANG, Q., MOTODA, H., MCLACHLAN, G. J., NG, A.,LIU, B. & PHILIP, S. Y Top 10 algorithms in data mining. Knowledge and Information Systems, 14, [4] RAIKWAL, J. & SAXENA, K Performance Evaluation of SVM and K-Nearest Neighbor Algorithm overmedical Data set. International Journal of Computer Applications, 50, [5] KAREGOWDA, A. G., JAYARAM, M. & MANJUNATH, A Cascading K-means Clustering and KNearest Neighbor Classifier for Categorization of Diabetic Patients. International Journal of Engineering and Advanced Technology (IJEAT) ISSN, [6] PERNKOPF, F Bayesian network classifiers versus selective k-nn classifier. Pattern Recognition, 38, [7] RAMANA, B. V., BABU, M. S. P. & VENKATESWARLU, N A critical study of selected classification algorithms for liver disease diagnosis. International Journal of Database Management Systems, 3, [8] BARROS, R. C., BASGALUPP, M. P., DE CARVALHO, A. & FREITAS, A. A A survey of [9] evolutionary algorithms for decision-tree induction. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42, [10] WANG, Y., MAKEDON, F. S., FORD, J. C. & PEARLMAN, J HykGene: a hybrid approach for [11] selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics, 21, [12] DE MÁNTARAS, R. L A distance-based attribute selection measure for decision tree induction. Machine [13] learning, 6, [14] PAWAR, T. & KAMALAPUR, S. A Survey on Privacy Preserving Decision Tree Classifier. [15] HAN, J., KAMBER, M. & PEI, J Data mining: concepts and techniques, Morgan kaufmann. [16] PANG-NING, T., STEINBACH, M. & KUMAR, V. Year. Introduction to data mining. In: Library of Congress, [17] KHAN, F. S., ANWER, R. M., TORGERSSON, O. & FALKMAN, G Data mining in oral medicine using decision trees. World Academy of Science, Engineering and Technology, 37, [18] OZER, P Data Mining Algorithms for Classification. [19] PATIL, R. A., AHIRE, P. G., PATIL, P. D. & GOLANDE, A. L. A Modified Approach to Construct Decision Tree in Data Mining Classification. [20] SURYAWANSHI, R. D. & THAKORE, D Decision Tree Classification Implementation with Fuzzy [21] Logic. IJCSNS, 12, 93. [22] BAYAT, S., CUGGIA, M., ROSSILLE, D., KESSLER, M. & FRIMAT, L. Year. Comparison of Bayesian Network and Decision Tree Methods for Predicting Access to the Renal Transplant Waiting List. In: MIE, [23] SONI, J., ANSARI, U., SHARMA, D. & SONI, S Predictive data mining for medical diagnosis: An overview of heart disease prediction. International Journal of Computer Applications, 17, [24] SREEMATHY, J. & BALAMURUGAN, P An efficient text classification using knn and naive bayesian. [25] International Journal on Computer Science and Engineering, 4, [26] Hardik Maniya, Mohsin Hasan, Komal Patil Comparative study of Naïve Bayes and KNN for Tuberculosis. International Journal of Computer Applications(JCA),2011. [27] MOLLAZADE, K., OMID, M. & AREFI, A Comparing data mining classifiers for grading raisins based on visual features. Computers and Electronics in Agriculture, 84,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining
A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad [email protected]
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Effective Analysis and Predictive Model of Stroke Disease using Classification Methods
Effective Analysis and Predictive Model of Stroke Disease using Classification Methods A.Sudha Student, M.Tech (CSE) VIT University Vellore, India P.Gayathri Assistant Professor VIT University Vellore,
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
International Journal of Software and Web Sciences (IJSWS) www.iasir.net
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
AnalysisofData MiningClassificationwithDecisiontreeTechnique
Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
BIG DATA IN HEALTHCARE THE NEXT FRONTIER
BIG DATA IN HEALTHCARE THE NEXT FRONTIER Divyaa Krishna Sonnad 1, Dr. Jharna Majumdar 2 2 Dean R&D, Prof. and Head, 1,2 Dept of CSE (PG), Nitte Meenakshi Institute of Technology Abstract: The world of
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
Decision Support System on Prediction of Heart Disease Using Data Mining Techniques
International Journal of Engineering Research and General Science Volume 3, Issue, March-April, 015 ISSN 091-730 Decision Support System on Prediction of Heart Disease Using Data Mining Techniques Ms.
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
A Survey on Intrusion Detection System with Data Mining Techniques
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
A Review of Missing Data Treatment Methods
A Review of Missing Data Treatment Methods Liu Peng, Lei Lei Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, 200433, P.R. China ABSTRACT Missing data is a common
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING
I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Heart Disease Diagnosis Using Predictive Data mining
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES
REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES R. Chitra 1 and V. Seenivasagam 2 1 Department of Computer Science and Engineering, Noorul Islam Centre for
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Inner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
A survey on Data Mining based Intrusion Detection Systems
International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
Improving spam mail filtering using classification algorithms with discretization Filter
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
Decision-Tree Learning
Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
152 International Journal of Computer Science and Technology. Paavai Engineering College, India
IJCST Vol. 2, Issue 3, September 2011 Improving the Performance of Data Mining Algorithms in Health Care Data 1 P. Santhi, 2 V. Murali Bhaskaran, 1,2 Paavai Engineering College, India ISSN : 2229-4333(Print)
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
Prototype-based classification by fuzzification of cases
Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university [email protected] Bernard De Baets Dep. Applied Mathematics
Professor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
