ClubEnsayos.com - Ensayos de Calidad, Tareas y Monografias
Buscar

A Tool For Web Usage Mining


Enviado por   •  25 de Agosto de 2014  •  3.733 Palabras (15 Páginas)  •  325 Visitas

Página 1 de 15

A Tool for Web Usage Mining

Jose M. Domenech1 and Javier Lorenzo2

1 Hospital Juan Carlos I

Real del Castillo 152 - 3571 Las Palmas - Spain

jdomcab@gobiernodecanarias.org

2 Inst. of Intelligent Systems and Num. Applic. in Engineering

Univ. of Las Palmas

Campus Univ. de Ta¯ra - 35017 Las Palmas - Spain

jlorenzo@iusiani.ulpgc.es

Abstract. This paper presents a tool for web usage mining. The aim

is centered on providing a tool that facilitates the mining process rather

than implement elaborated algorithms and techniques. The tool covers

di®erent phases of the CRISP-DM methodology as data preparation,

data selection, modeling and evaluation. The algorithms used in the

modeling phase are those implemented in the Weka project. The tool

has been tested in a web site to ¯nd access and navigation patterns.

1 Introduction

Discovering knowledge from large databases has received great attention during

the last decade being the data mining the main tool to make it [1]. The world

wide web has been considered as the largest repository of information but it lacks

of a well de¯ned structure. Thus the world wide web is a good environment to

make data mining receiving the name of Web Mining [2, 3].

Web mining can be divided into three main topics: Content Mining, Structure

Mining and Usage Mining. This work is focused on Web Usage Mining (WUM)

that has been de¯ned as "the application of data mining techniques to discover

usage patterns from Web data" [4]. Web usage mining can provide patterns of

usage to the organizations in order to obtain customer pro¯les and therefore

they can make easier the website browsing or present speci¯c products/pages.

The latter has a great interest for businesses because it can increase the sales

if they o®er only appealing products to the customers although as pointed out

Anand (Anand et al, 2004), it is di±cult to present a convincing case for Re-

turn on Investment. The success of data mining applications, as many other

applications, depend on the development of a standard. CRISP-DM, (Standard

Cross-Industry Process for Data Mining) (CRISP-DM, 2000) is a consortium of

companies that has de¯ned and validated a data mining process that can be used

into di®erent data mining projects as web usage mining. The life cycle of a data

mining project is de¯ned by CRISP-DM into 6 stages: Business Understanding,

Data Understanding, Data Preparation, Modeling, Evaluation and Deployment.

The Business Understanding phase is highly connected with the problem to

be solved because they de¯ned the business objectives of the application. The last

8th International Conference on Intelligent Data Engineering and Automated Learning

(IDEAL'07), 16-19 December, 2007, Birmingham, UK.

one, Deployment, is not easy to make automatically because each organization

has its own information processing management. For the rest of stages a tool

can be designed in order to facilitate the work of web usage mining practitioners

and reduce the development of new applications.

In this work we implement the WEBMINER architecture [5] which divides

the WUM process into three main parts: preprocessing, pattern discovery and

pattern analysis. This three parts corresponds to the data preparation, modeling

and evaluation of the CRISP-DM model.

In this paper we present a tool to facilitate the Web Usage Mining based

on the WEBMINER architecture. The tool is conceived as a framework where

di®erent techniques can be used in each stage facilitating in this way the experi-

mentation and thus eliminating the need of programming the whole application

when we are interested in studying the e®ect of a new method in the mining

process. The architecture of the tool is shown in Figure 1 and the di®erent ele-

ments that makes up it will be described. Thus, the paper is organized as follows.

Section 2 will describe the data preprocessing. In sections 3 and 5 di®erent ap-

proaches to user session and transactions identi¯cation will be presented. Finally

in sections 6 and 7 the models to be generate and the results are presented.

Web site

crawler

Data

preprocessing

Session

identification

<<Table>>

log

Classifier

training

Feature

Extraction

Clustering

Association rules

discovering

...

Descargar como (para miembros actualizados)  txt (24.7 Kb)  
Leer 14 páginas más »
Disponible sólo en Clubensayos.com