Loading

Error: Cannot Load Popup Box

Hit List

Title:

k-STARs: Sequences of Spatio-Temporal Association Rules

Description:

A Spatio-Temporal Association Rule (STAR) describes how objects move between regions over time. Since they describe only a single movement between two regions, it is very difficult to see larger patterns in the dataset by considering only the set of STARs. It is especially difficult on complex datasets where the underlying patterns overlap. At b...

A Spatio-Temporal Association Rule (STAR) describes how objects move between regions over time. Since they describe only a single movement between two regions, it is very difficult to see larger patterns in the dataset by considering only the set of STARs. It is especially difficult on complex datasets where the underlying patterns overlap. At best we will miss important patterns- being unable to “see the forest for the trees”, and at worst this can lead to false interpretations. We introduce the k-STAR pattern which describes the sequences of STARs that objects obey. Since a k-STAR captures sequences of object movements it solves these problems. We also allow space and time gaps between successive STARs, as well as supporting ‘replenishable ’ k-STARs so we are able to capture the rich set of patterns that exist in real world data. We define a lattice on the k-STARs that allows the user to drill down and drill up in order to explore the patterns in detail, or view them at a higher level. We introduce two important measures; min-l-support and min-l-confidence that allow us to achieve the above. This paper gives a rigorous theoretical treatment of k-STARs, proving various anti-monotonic and weakly anti-monotonic properties that can be exploited to mine k-STARs efficiently. We describe an algorithm, k-STARMiner, that uses these results to mine the lattice of k-STARs 1. 1 Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2008-12-04

Source:

http://www.it.usyd.edu.au/research/tr/tr589.pdf

http://www.it.usyd.edu.au/research/tr/tr589.pdf Minimize

Document Type:

text

Language:

en

DDC:

520 Astronomy & allied sciences *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

Metadata may be used without restrictions as long as the oai identifier remains attached to it. Minimize

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Sinks, Stationary Regions and Thoroughfares in Object Mobility Databases

Author:

Description:

Abstract. As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining (ARM) has been one of the more extensively studied data mining techniques, but it considers di...

Abstract. As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining (ARM) has been one of the more extensively studied data mining techniques, but it considers discrete transactional data (supermarket or sequential). Most attempts to apply this technique to spatial-temporal domains maps the data to transactions, thus losing the spatio-temporal characteristics. We provide a comprehensive definition of spatio-temporal association rules (STARs) that describe how objects move between regions over time. We define support in the spatio-temporal domain to effectively deal with the semantics of such data. We also introduce other patterns that are useful for mobility data; stationary regions and high traffic regions. The latter consists of sources, sinks and thoroughfares. These patterns describe important temporal characteristics of regions and we show that they can be considered as special STARs. We provide efficient algorithms to find these patterns by exploiting several pruning properties. 1 Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2008-07-01

Source:

http://www.cs.usyd.edu.au/~chawla/papers/dasfaa06.pdf

http://www.cs.usyd.edu.au/~chawla/papers/dasfaa06.pdf Minimize

Document Type:

text

Language:

en

DDC:

006 Special computer methods *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

Metadata may be used without restrictions as long as the oai identifier remains attached to it. Minimize

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Mining Complex, Maximal and Complete Sub-graphs and Sets of Correlated Variables with Applications to Feature Subset Selection

Description:

Finding interactions between variables is a fundamental concept in Data Mining. In this work, correlations between variables are considered using Pearson’s product moment correlation coefficient. Of interest are complex, complete, and maximal sub-graphs which describe the correlation structure between variables. This paper considers both positiv...

Finding interactions between variables is a fundamental concept in Data Mining. In this work, correlations between variables are considered using Pearson’s product moment correlation coefficient. Of interest are complex, complete, and maximal sub-graphs which describe the correlation structure between variables. This paper considers both positive and negative correlations – complex interactions. It is proved that under a constraint on the minimum level of correlation desired, there are useful guarantees on the structure of the correlations. In particular, the sign of the correlation between variables can be mapped to the variables themselves (i.e. to the vertices). This means that the complete complex sub-graphs can be represented as a complex set, where each element – a variable with a positive or a negative sign – is highly positively correlated with every other. This makes the interaction much easier to understand. It is also exploited to develop an algorithm that runs in the same time as if complex interactions were not considered, resulting in significantly improved scalability. Mining maximal sets of variables characterized by the lack of correlations is also briefly considered. The approach is useful for examining complex correlation structures, as well as mining a representative subset of the entire data set. The latter idea is extended to the problem of feature subset selection in a way that gives guarantees on the minimum correlation required for features to be considered interchangeable (redundant), while guaranteeing that the selected features are not correlated with each other. Experiments show the approach performs well. 1 Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2009-06-11

Source:

http://www.siam.org/proceedings/datamining/2008/dm08_55_verhein.pdf

http://www.siam.org/proceedings/datamining/2008/dm08_55_verhein.pdf Minimize

Document Type:

text

Language:

en

DDC:

006 Special computer methods *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

Metadata may be used without restrictions as long as the oai identifier remains attached to it. Minimize

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Mining complex spatio-temporal sequence patterns

Description:

Mining sequential movement patterns describing group behaviour in potentially streaming spatio-temporal data sets is a challenging problem. Movements are typically noisy and often overlap each other. This makes a set of simple patterns difficult to interpret and sequences difficult to mine. Furthermore, group behaviour is complex. Objects in a g...

Mining sequential movement patterns describing group behaviour in potentially streaming spatio-temporal data sets is a challenging problem. Movements are typically noisy and often overlap each other. This makes a set of simple patterns difficult to interpret and sequences difficult to mine. Furthermore, group behaviour is complex. Objects in a group may behave similarly for a period of time (an interesting pattern sequence), then split up – either spatially, temporally or both; making a series of uninteresting movements before rejoining again. This behaviour must be captured in a single pattern for that group, rather than a number of unconnected pattern sequences. Secondly, it often occurs that individual objects only move along segments of a path, perhaps between intersections in a road or highway. However, the entire path is interesting when all such behaviours are taken together. Therefore, a pattern describing such behaviour should be found, rather than just a number of short sequences. This paper solves these challenges, among others, by mining sequences of Spatio-Temporal Association Rules. Theoretical results are exploited in order to develop an efficient algorithm, which is demonstrated to have linear run time in the number of interesting sequences discovered. A lattice for drill down and roll up exploratory analysis of the sequence patterns is proposed. Finally, verifiable and interesting patterns possessing the above characteristics are found in a real world animal tracking data set. 1 Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2012-04-01

Source:

http://www.siam.org/proceedings/datamining/2009/dm09_056_verheinf.pdf

http://www.siam.org/proceedings/datamining/2009/dm09_056_verheinf.pdf Minimize

Document Type:

text

Language:

en

DDC:

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Geometrically Inspired Itemset Mining

Description:

In our geometric view, an itemset is a vector (itemvector) in the space of transactions. The support of an itemset is the generalized dot product of the participating items. Linear and potentially nonlinear transformations can be applied to the itemvectors before mining patterns. Aggregation functions can be applied to the transformed vectors an...

In our geometric view, an itemset is a vector (itemvector) in the space of transactions. The support of an itemset is the generalized dot product of the participating items. Linear and potentially nonlinear transformations can be applied to the itemvectors before mining patterns. Aggregation functions can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (◦) and measures (f and F). For frequent itemset mining (FIM), g and F are identity transformations, ◦ is intersection and f is the cardinality. Based on the geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over this data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FP-Growth on realistic datasets above a small support threshold (0.29 % and 1.2 % in our experiments). 1 Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2008-07-17

Source:

http://www.it.usyd.edu.au/~chawla/papers/icdm06_tr.pdf

http://www.it.usyd.edu.au/~chawla/papers/icdm06_tr.pdf Minimize

Document Type:

text

Language:

en

DDC:

006 Special computer methods *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Using significant positively associated and relatively class correlated rules for associative classification of imbalanced datasets

Description:

The application of association rule mining to classifica-tion has led to a new family of classifiers which are often re-ferred to as “Associative Classifiers (ACs)”. An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as m...

The application of association rule mining to classifica-tion has led to a new family of classifiers which are often re-ferred to as “Associative Classifiers (ACs)”. An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical di-agnosis and fraud detection where “imbalanced data sets” are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statisti-cal techniques. We combine the use of statistically signifi-cant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Exper-iments show that in terms of classification quality, SPAR-CCC performs comparably on balanced datasets and out-performs other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient. 1 Minimize

Publisher:

IEEE Computer Society Press

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2014-11-25

Source:

http://pmg.it.usyd.edu.au/papers/parccc.pdf

http://pmg.it.usyd.edu.au/papers/parccc.pdf Minimize

Document Type:

text

Language:

en

DDC:

006 Special computer methods *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Mining spatio-temporal association rules, sources, sinks, stationary regions and thoroughfares in object mobility databases

Description:

Abstract. As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining has been one of the more extensively studied data mining techniques, but it considers discrete...

Abstract. As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining has been one of the more extensively studied data mining techniques, but it considers discrete transactional data (supermarket or sequential). Most attempts to apply this technique to spatial-temporal domains maps the data to transactions, thus losing the spatio-temporal characteristics. We provide a comprehensive definition of spatio-temporal association rules (STARs) that describe how objects move between regions over time. We define support in the spatio-temporal domain to effectively deal with the semantics of such data. We also introduce other patterns that are useful for mobility data; stationary regions and high traffic regions. The latter consists of sources, sinks and thoroughfares. These patterns describe important temporal characteristics of regions and we show that they can be considered as special STARs. We provide efficient algorithms to find these patterns by exploiting several pruning properties 1. 1 Minimize

Publisher:

Springer

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2009-01-08

Source:

http://www.cs.usyd.edu.au/~fverhein/publications/verhein06stars_dasfaa.pdf

http://www.cs.usyd.edu.au/~fverhein/publications/verhein06stars_dasfaa.pdf Minimize

Document Type:

text

Language:

en

DDC:

006 Special computer methods *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Geometrically Inspired Itemset Mining

Description:

In our geometric view, an itemset is a vector (itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that intere...

In our geometric view, an itemset is a vector (itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (◦) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, ◦ is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FP-Growth on realistic datasets above a small support threshold (0.29 % and 1.2 % in our experiments) 1. 1 Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2008-07-17

Source:

http://www.cs.usyd.edu.au/~fverhein/publications/verhein06glimit_icdm.pdf

http://www.cs.usyd.edu.au/~fverhein/publications/verhein06glimit_icdm.pdf Minimize

Document Type:

text

Language:

en

DDC:

006 Special computer methods *(computed)*

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Almost junk: classifying public announcements for user communities

Author:

Description:

Abstract — This paper describes our work towards building a smart personal assistant that helps users deal with the ever increasing volume of email. To this end, we filter incoming messages for the user, organising messages in ways that enable the user to deal with the information effectively. We describe an initial corpus of public email built ...

Abstract — This paper describes our work towards building a smart personal assistant that helps users deal with the ever increasing volume of email. To this end, we filter incoming messages for the user, organising messages in ways that enable the user to deal with the information effectively. We describe an initial corpus of public email built for experiments on learning to classify email. We also report initial results of experiments based on that corpus. These explore the question of how well a classifier can perform the task of classifying a user’s email into categories that reflect the users ’ thinking about that email. We also report initial experiments aimed at answering a second question: if a learner has been trained by one user, how effective is it for another user? That is, can we share information about classifications of messages improving collective performance? I. Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2008-12-04

Source:

http://cs.anu.edu.au/~Eric.McCreath/papers/adcs2003.pdf

http://cs.anu.edu.au/~Eric.McCreath/papers/adcs2003.pdf Minimize

Document Type:

text

Language:

en

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Title:

Probabilistic frequent itemset mining in uncertain databases

Author:

Description:

Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard “certain” transaction databases. The consideration of existential uncertainty of item(sets), indicating the probability that an item(set) occurs in a transaction, makes traditional tech...

Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard “certain” transaction databases. The consideration of existential uncertainty of item(sets), indicating the probability that an item(set) occurs in a transaction, makes traditional techniques inapplicable. In this paper, we introduce new probabilistic formulations of frequent itemsets based on possible world semantics. In this probabilistic context, an itemset X is called frequent if the probability that X occurs in at least minSup transactions is above a given threshold τ. To the best of our knowledge, this is the first approach addressing this problem under possible worlds semantics. In consideration of the probabilistic formulations, we present a framework which is able to solve the Probabilistic Frequent Itemset Mining (PFIM) problem efficiently. An extensive experimental evaluation investigates the impact of our proposed techniques and shows that our approach is orders of magnitude faster than straight-forward approaches. Minimize

Contributors:

The Pennsylvania State University CiteSeerX Archives

Year of Publication:

2010-01-31

Source:

http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/KDD09_PFIM.pdf

http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/KDD09_PFIM.pdf Minimize

Document Type:

text

Language:

en

Subjects:

H.2.8 [Database Applications ; Data Mining General Terms Algorithms ; Theory Keywords Uncertain Databases ; Frequent Itemset Mining ; Probabilistic Data ; Probabilistic Frequent Itemsets

H.2.8 [Database Applications ; Data Mining General Terms Algorithms ; Theory Keywords Uncertain Databases ; Frequent Itemset Mining ; Probabilistic Data ; Probabilistic Frequent Itemsets Minimize

DDC:

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

URL:

Content Provider:

My Lists:

My Tags:

Notes:

Currently in BASE: 68,072,316 Documents of 3,307 Content Sources

http://www.base-search.net