Intrinsic Dimensionality Estimation for Highdimensional Data Sets: New Approaches for the Computation of Correlation Dimension
Intrinsic Dimensionality Estimation for Highdimensional Data Sets: New Approaches for the Computation of Correlation Dimension
Jochen Einbeck
Zakiah Kalantana
The analysis of high–dimensional data is usually challenging since many standard modelling approaches tend to break down due to the so–called “curse of dimensionality”. Dimension reduction techniques, which reduce the data set (explicitly or implicitly) to a smaller number of variables, make the data analysis more efficient and are furthermore u...
The analysis of high–dimensional data is usually challenging since many standard modelling approaches tend to break down due to the so–called “curse of dimensionality”. Dimension reduction techniques, which reduce the data set (explicitly or implicitly) to a smaller number of variables, make the data analysis more efficient and are furthermore useful for visualization purposes. However, most dimension reduction techniques require fixing the intrinsic dimension of the lowdimensional subspace in advance. The intrinsic dimension can be estimated by fractal dimension estimation methods, which exploit the intrinsic geometry of a data set. The most popular concept from this family of methods is the correlation dimension, which requires estimation of the correlation integral for a ball of radius tending to 0. In this paper we propose approaches to approximate the correlation integral in this limit. Experimental results on real world and simulated data are used to demonstrate the algorithms and compare to other methodology. A simulation study which verifies the effectiveness of the proposed methods is also provided.
2013
Journal of Emerging Technologies in Web Intelligence, Vol 5, Iss 2, Pp 9197 (2013)
article
intrinsic dimensionality ; fractalbased methods ; correlation dimension ; LCC:Science (General) ; LCC:Q1390 ; LCC:Science ; LCC:Q
intrinsic dimensionality ; fractalbased methods ; correlation dimension ; LCC:Science (General) ; LCC:Q1390 ; LCC:Science ; LCC:Q
http://doaj.org/search?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3...
http://doaj.org/search?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3...
Modelling beyond regression functions: an application of multimodal regression to speedflow data
Modelling beyond regression functions: an application of multimodal regression to speedflow data
Jochen Einbeck
Gerhard Tutz
For speedflow data, which are intensively discussed in transportation science, common nonparametric regression models of the type "y"="m"("x")+noise turn out to be inadequate since simple functional models cannot capture the essential relationship between the predictor and response. Instead a more general setting is required, allowing for multi...
For speedflow data, which are intensively discussed in transportation science, common nonparametric regression models of the type "y"="m"("x")+noise turn out to be inadequate since simple functional models cannot capture the essential relationship between the predictor and response. Instead a more general setting is required, allowing for multifunctions rather than functions. The tool proposed is conditional modes estimation which, in the form of local modes, yields several branches that correspond to the local modes. A simple algorithm for computing the branches is derived. This is based on a conditional mean shift algorithm and is shown to work well in the application that is considered. Copyright 2006 Royal Statistical Society.
LOCAL FITTING WITH A POWER BASIS Authors:
LOCAL FITTING WITH A POWER BASIS
Jochen Einbeck
Local polynomial modelling can be seen as a local fit of the data against a polynomial basis. In this paper we extend this method to the power basis, i.e. a basis which consists of the powers of an arbitrary function. Using an extended Taylor theorem, we derive asymptotic expressions for bias and variance of this estimator. We apply this method ...
Local polynomial modelling can be seen as a local fit of the data against a polynomial basis. In this paper we extend this method to the power basis, i.e. a basis which consists of the powers of an arbitrary function. Using an extended Taylor theorem, we derive asymptotic expressions for bias and variance of this estimator. We apply this method to a simulated data set for various basis functions and discuss situations where the fit can be improved by using a suitable basis. Finally, some remarks about bandwidth selection are given and the method is applied to real data. KeyWords: local polynomial fitting; Taylor expansion; power basis; bias reduction.
2013
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
URL:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.7302
http://www.ine.pt/revstat/pdf/rs040201.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.7302
http://www.ine.pt/revstat/pdf/rs040201.pdf
ON DESIGNWEIGHTED LOCAL FITTING AND ITS RELATION TO THE HORVITZTHOMPSON ESTIMATOR
ON DESIGNWEIGHTED LOCAL FITTING AND ITS RELATION TO THE HORVITZTHOMPSON ESTIMATOR
Jochen Einbeck
Abstract: Weighting is a widely used concept in many fields of statistics and has frequently caused controversies on its justification and benefit. In this paper, we analyze designweighted versions of the wellknown local polynomial regression estimators, derive their asymptotic bias and variance, and observe that the asymptotically optimal wei...
Abstract: Weighting is a widely used concept in many fields of statistics and has frequently caused controversies on its justification and benefit. In this paper, we analyze designweighted versions of the wellknown local polynomial regression estimators, derive their asymptotic bias and variance, and observe that the asymptotically optimal weights are in conflict with (practically motivated) weighting schemes previously proposed in the literature. We investigate this conflict using theory and simulation, and find that the problem has a surprising counterpart in sampling theory, leading us back to the discussion on the HorvitzThompson estimator and Basu’s (1971) elephants. In this light one might consider our results as an asymptotic and nonparametric version of the HorvitzThompson theorem. The crucial point is that biasminimizing weights can make estimators extremely vulnerable to outliers in the design space and have therefore to be used with particular care. Key words and phrases: Bias reduction, HorvitzThompson estimator, kernel smoothing, leverage values, local polynomial modelling, nonparametric smoothing, stratification.
2013
DDC:
310 Collections of general statistics
(computed)
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Modelling beyond Regression Functions: an Application of Multimodal Regression to SpeedFlow Data
Modelling beyond Regression Functions: an Application of Multimodal Regression to SpeedFlow Data
Jochen Einbeck
Gerhard Tutz
Ludwig Maximilians Universität
An enormous amount of publications deals with smoothing in the sense of nonparametric regression. However, nearly all of the literature treats the case where predictors and response are related in the form of a function y = m(x) + noise. In many situations this simple functional model does not capture adequately the essential relation between pr...
An enormous amount of publications deals with smoothing in the sense of nonparametric regression. However, nearly all of the literature treats the case where predictors and response are related in the form of a function y = m(x) + noise. In many situations this simple functional model does not capture adequately the essential relation between predictor and response. We show by means of speedflow diagrams, that a more general setting may be required, allowing for multifunctions instead of only functions. It turns out that in this case the conditional modes are more appropriate for the estimation of the underlying relation than the commonly used mean or the median. Estimation is achieved using a conditional meanshift procedure, which is adapted to the present situation.
2013
Subjects:
Key Words ; Mean shift ; Conditional density ; Conditional mode ; Speedflow curves
Key Words ; Mean shift ; Conditional density ; Conditional mode ; Speedflow curves
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
License GPL (> = 2)
License GPL (> = 2)
Jochen Einbeck
Ludger Evers
Lazyload Yes
Description Fitting multivariate data patterns with local principal curves; including simple tools for data compression (projection), bandwidth selection, and measuring goodnessoffit.
Description Fitting multivariate data patterns with local principal curves; including simple tools for data compression (projection), bandwidth selection, and measuring goodnessoffit.
2011
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Repository CRAN
Repository CRAN
Jochen Einbeck
Ross Darnell
John Hinde
Lazyload Yes
Needscompilation No
R topics documented: npmlregpackage. 2 alldist. 3 dkern. 10 fabric. 11 family.glmmNPML. 12 gqz. 13 hosp. 14 1 2 npmlregpackage irlsuicide. 15 missouri. 16 plot.glmmNPML. 17 post. 19 predict.glmmNPML. 20
R topics documented: npmlregpackage. 2 alldist. 3 dkern. 10 fabric. 11 family.glmmNPML. 12 gqz. 13 hosp. 14 1 2 npmlregpackage irlsuicide. 15 missouri. 16 plot.glmmNPML. 17 post. 19 predict.glmmNPML. 20
2014
Subjects:
This program is free software ; you can redistribute it and/or mo
This program is free software ; you can redistribute it and/or mo
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
quadrature for overdispersed generalized linear models and variance component models License GPL (> = 2) Repository CRAN
quadrature for overdispersed generalized linear models and variance component models License GPL (> = 2) Repository CRAN
Jochen Einbeck
Ross Darnell
John Hinde
Lazyload Yes
Needscompilation No
R topics documented: npmlregpackage. 2 alldist. 3 dkern. 10 fabric. 11 family.glmmNPML. 12 gqz. 13 hosp. 14 irlsuicide. 15 missouri. 16 1 2 npmlregpackage plot.glmmNPML. 17 post. 19 predict.glmmNPML. 20 summary.glmmNPML. 22 tolfind. 24 weightslogl.calc.w. 26 Index 28 npmlregpackage Nonparametric maximum likelihood estimation for random effect models
R topics documented: npmlregpackage. 2 alldist. 3 dkern. 10 fabric. 11 family.glmmNPML. 12 gqz. 13 hosp. 14 irlsuicide. 15 missouri. 16 1 2 npmlregpackage plot.glmmNPML. 17 post. 19 predict.glmmNPML. 20 summary.glmmNPML. 22 tolfind. 24 weightslogl.calc.w. 26 Index 28 npmlregpackage Nonparametric maximum likelihood estimation for random effect models
2013
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
astronomical
astronomical
Jochen Einbeck
Ludger Evers
Coryn Bailerjones
Description:
Representing complex data using localized
Representing complex data using localized
2013
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Representing Complex Data Using Localized Principal Components with Application to Astronomical Data
Representing Complex Data Using Localized Principal Components with Application to Astronomical Data
Jochen Einbeck
Ludger Evers
Coryn BailerJones
Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: “nonlinear”, “branched”, “disconnected”, “bended”, “curved”, “heterogeneous”, or, more general, “complex”. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. ...
Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: “nonlinear”, “branched”, “disconnected”, “bended”, “curved”, “heterogeneous”, or, more general, “complex”. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting tradeoff between these two objectives. We apply these methods to several real data sets. In particular, we consider
2012
DDC:
310 Collections of general statistics
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
