|
In the last decade we have experienced an explosive growth of information through the web. Locating information seems to be very easy, while determining the quality of information can be tricky. This course is for students who want to know why search engines can answer your queries fast and (most of time) accurately, why other times seem to be missing the point and provide untrustworthy information, and how one can design a web site that acquires high visibility on the web. We will cover traditional information retrieval methods and web search algorithms such as crawlers and spiders, with a focus on probabilistic and graph-theoretic methods that can detect web spam. We will also cover some basic understanding of text mining and data clustering. Time permiting, we will examine other relevant issues of the information explosion era, such as the shape and structure of the web, epistemology of information and properties of large random networks.
|