KEYNOTE SPEAKER / TUTORIAL
Vaclav Snasel (VSB-Technical University of Ostrava, Czech Republic)
Vaclav Snasel's research and development experience includes over 25 years in the Industry and Academia. He works in a multi-disciplinary environment involving artificial intelligence, multidimensional data indexing, conceptual lattice, information retrieval, semantic web, knowledge management, data compression, machine intelligence, neural network, web intelligence, data mining and applied to various real world problems. He has given more than 6 plenary lectures and conference tutorials in these areas. He has authored/co-authored several refereed journal/conference papers and book chapters. He has published more than 350 papers (100 papers on Web of Science). He has supervised many Ph.D. students from Czech Republic, Jordan, Yemen, Slovakia, Ukraine and Vietnam.
From 2001 he is a visiting scientist in the Institute of Computer Science, Academy of Sciences of the Czech Republic. From 2003 he is vice-dean for Research and Science at Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Czech Republic. He is full professor since 2006. Before turning into a full time academic, he was working with industrial company where he was involved in different industrial research and development projects for nearly 8 years. He received Ph.D. degree in Algebra and Geometry from Masaryk University, Brno, Czech Republic and a Master of Science degree from Palacky University, Olomouc, Czech Republic.
Keynote Abstract:
Binary Data Mining
Binary data have been occupying a special place in the domain of data analysis. Analysis of binary data sets, however, generally leads to NP-complete/hard problems. Consequently, the focus here is on effective heuristics for reducing the problem size.
Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing the multifactor structure of of high dimensional real world data. The multilinear modeling technique employs a tensor extension of the conventional matrix singular value decomposition (SVD), nonnegative matrix decomposition (NMF), semi discrete decomposition (SDD) etc.
There are several well known methods and algorithms for factorization of real data but many application areas including information retrieval, pattern recognition and data mining require processing of binary rather than real data. Unfortunately, the methods used for real matrix factorization fail in the latter case. In this lecture we introduce background for binary matrix factorization. In order to perform object recognition (no matter which one) it is necessary to learn representations of the underlying characteristic components. Such components correspond to object-parts, or features. These data sets may comprise discrete attributes, such as those from market basket analysis, information retrieval, and bioinformatics, as well as continuous attributes such as those in scientific simulations, astrophysical measurements, and sensor networks. The feature extraction if applied on binary datasets, addresses many research and application fields, such as association rule mining, market basket analysis, discovery of regulation patterns in DNA microarray experiments, etc. So called bars problem is used as the benchmark. Set of artificial signals generated as a Boolean sum of given number of bars is analyzed by these methods. Here we will concentrate on the case of black and white pictures of bars combinations represented as binary vectors, so the complex feature extraction methods are unnecessary. Generally we can employ the lattice theory within the field of computational intelligence.
Many applications in computer and system science like as clustering, classification, pattern analysis and matrix decomposition, involve analysis of large scale and often high dimensional lattice data. Therefore, suitable methods approximating the data in lower dimensions or with lower rank are needed. In the following, we focus on the factorization of high-dimensional binary or lattice data or high order binary or lattice tensors.
Tutorial Abstract:
Web Content Mining
Web page is like a family house. Each part has its sense, determined by a purpose which it serves. Every part can be named so that everybody imagines approximately the same thing under that name (living room, bathroom, lobby, bedroom, kitchen, balcony,…). In order that the inhabitants may orientate well in the house, certain rules are kept. From the point of view of these rules, all houses are similar. That is why it is usually not a problem e.g. for first time visitors to orientate in the house. We can describe the house quite precisely thanks to names. If we add information about a more detailed location such as sizes, colours, equipment and further details to the description, then the future visitor can get an almost perfect notion of what he will see in the house when he comes in for the first time. We can also approach similarly the description of a building other than a family house (school, supermarket, office etc.). Also in this case the same applies for visitors and it is usually not a problem to orientate (of course it does not always have to be the case, as well as bad Web pages there are also bad buildings).
Let us look at the problem from the other side. If we visit a building with a blindfolded person, then we can submit basically three tasks. The first is to find out what the purpose of the building is. The second is to find out what parts (e.g. rooms) the building contains and the third task can be linked e.g. to the equipment of individual rooms. When solving these tasks, it is probably possible to start with any of them. There is another important issue. If the visitor completes some of the tasks and we will require him to describe the result, he will certainly use commonly used names, which describe the type of building, its parts and finally, its equipment. Architect Christopher Alexander brought in a similar and to a certain extent formalized way of description. In our tutorial, we work with a Web page in a similar way. We have shown that this way of looking at a Web page can moreover, be a good tool for the classification of some approaches in the field of Web content mining. Furthermore, in the framework of our own research, we managed to verify experimentally, that it is reasonable to use a Web page description by the named parts of the Web page. This holds true both for the suggestion of methods for page semantics detection and for the technically utilizable user's page description.
|