Intro to Data Mining

Chapter 2 Exercises: 8. Discuss why a document-term matrix is an issue of a basis set that has asymmetric discrete or asymmetric uniform features. 9. Many expertnesss believe on study instead of (or in attention to) purposed experiments.Compare the basis tendency issues compromised in studyal expertness delay those of experimental expertness and basis mining. 10. Discuss the discord betwixt the accuracy of a size and the provisions single and embrace accuracy, as they are used in computer expertness, typically to reoffer floating-point collection that insist-upon 32 and 64 bits, regardively. 18. This use assimilates and contrasts some coincidence and dissociation estimates. (a) For binary basis, the L1 dissociation corresponds to the Hamming dissociation;that is, the compute of bits that are incongruous betwixt two binary vectors.The Jaccard coincidence is a estimate of the coincidence betwixt two binary vectors. Compute the Hamming dissociation and the Jaccard coincidence betwixt the subjoined two binary vectors.x = 0101010001y = 0100011000 (b) Which advent, Jaccard or Hamming dissociation, is balance congruous to the Simple Matching Coefficient, and which advent is balance congruous to the cosine measure? Explain. (Note: The Hamming estimate is a dissociation,age the other three estimates are congruousities, but don’t let this confuse you.) (c) Suppose that you are comparing how congruous two organisms of incongruous pattern are in provisions of the compute of genes they portion-out. Describe which estimate, Hamming or Jaccard, you purpose would be balance appropriate for comparing the genetic makeup of two organisms. Explain. (Assume that each voluptuous is represented as a binary vector, where each sign is 1 if a detail gene is offer in the organism and 0 incorrectly.) (d) If you wanted to assimilate the genetic makeup of two organisms of the identical pattern, e.g., two civilized men-folks, would you use the Hamming dissociation,the Jaccard coefficient, or a incongruous estimate of coincidence or dissociation?Explain. (Note that two civilized men-folks portion-out > 99.9% of the identical genes.) 22. Discuss how you ability map mutuality values from the cessation [-1,1] to the cessation [0,1]. Note that the pattern of alteration that you use ability depend on the impression that you feel in remembrance. Thus, attend two impressions:clustering age course and predicting the bearing of one age course abandoned another. 27. Show that the dissociation estimate defined as the inclination betwixt two basis vectors,x and y, satisfies the metric basis abandoned on page 70. Specifically, d(x, y) : arccos(cos(x,y)). Chapter 3 Exercises: 5. Describe how you would beget visualizations to parade advice that describes the subjoined patterns of systems. (a) Computer networks. Be secure to apprehend twain the static aspects of the network, such as connectivity, and the dynamic aspects, such as commerce. (b) The distribution of specific establish and voluptuous pattern environing the world for a specific importance in age. (c) The use of computer media, such as processor age, deep remembrance, and disk, for a set of benchmark basisbase programs. (d) The transmute in calling of workers in a detail country balance the last thirty years. Assume that you feel annually advice encircling each person that also apprehends gender and plane of teaching. Be secure to address the subjoined issues: * Representation. How earn you map objects, signs, and relationships to visual elements? * Arrangement. Are there any exceptional attendations that want to betaken into statement delay regard to how visual elements are paradeed? Specific issues ability be the valuable of viewpoint, the use of nakedness,or the dissociation of unfailing groups of objects. * Selection. How earn you use a large compute of signs and basis objects? 17. Discuss the discords betwixt dimensionality abatement inveterate on aggregation and dimensionality abatement inveterate on techniques such as PCA and SVD.