1. A

    Selecting the best subject's data and features to optimize the analysis

    I am not good at statistical analysis. So I am posting here my case and looking for your kind suggestions. My case: I have data from subjects, which each subject has two similar runs that were performed at different times. These data are for 78 items, which belongs to two different categories...
  2. S

    PyCM : New statistical analysis library for post classification in Python

    A classifier is expected to face with many datasets with different characteristics such as being unbalanced. Besides, their missions are different, for example, categorizing data into just two classes or more than two. There are many different parameters for evaluating the performance of a...
  3. M

    Arthritis trial data: Advanced multivariate stats problem.

    I am a doctor planning a post-hoc exploratory analysis of some clinical trial data and would like some suggestions as to the best approach. Patients with arthritis have many inflamed joints; in clinical trials, doctors test 66 of these for tenderness and swelling. The result is trinary: 0; no...
  4. E

    Adequacy of logit model with oversampled data

    I fitted a binary logit model with unbalanced data which were oversampled using SMOTE. This gave an excellent ROC curve but very poor adequacy - the zero hypothesis of adequacy was rejected by Hosmer-Lemeshow test and le Cessie – van Houwelingen – Copas – Hosmer unweighted sum of squares test...
  5. rogojel

    Support vector machines - what's the point?

    hi, I just finished the chapter on SVMs from the Statistical Learning book of Hastie and Tibshirani. Their focus is to use the SVMs for classification - and they also show that the SVMs are equivalent to the logistic regression with nonlinear predictors (there is even an impressive exercise to...
  6. S

    Financial Data Classification System

    Hello I am looking at financial data (Revenue for Example) over a 5-10 year time frame and want to classify the data based on trends observed. Am looking for test(s) I can explore to help me create the trends. My Statistical knowledge is founded in University level courses of a number of years...
  7. S

    Classification System based on Statistical tests

    Hello and Thanks I am looking at financial data (Revenue for Example) over a 5 year time frame and want to classify the data based on trends observed. Am looking for test(s) I can explore to help me create the trends. My Statistical knowledge is founded in University level courses of a number...
  8. M

    eQualityCoin- What type of model is this?

    Hi everyone, I've been working on developing a crypto-currency called eQualityCoin for a while now and hoped someone here might be able to help me "classify" the system in a formal mathematical sense. The system's main feature is a simple rule for how it determines a purchaser's exchange...
  9. C

    Ranking a Location using Linear Regression

    Hi, I am facing a strange problem with a software patented in the USA. It is a location evaluation software, which offers insights on determining the best location for starting-up a gas station / petrol station. While reading the patent thoroughly, I feel that the software operates based...
  10. R

    What do you call this type of analysis?

    I do a lot of statistical analysis at my job but I'm unsure of the terminology for what I do. I work for a lab where we screen every newborn in the state for 29 genetic disorders. That's 100,000 newborns a year, and sometimes we screen certain babies twice so about 125,000 samples total. I'm...
  11. T

    chi square tests and grouping

    I have a set of 130 people which I need to divide into groups of 5 that are proportionate in academic major and gender in comparison to the overall dataset. I have been asked to run chi square analyses to make sure the groups are equally divided. Can anyone give advice as to how to do this...
  12. S

    Classification model according to multivariate presence/absence dataset

    Hi there! I am currently working on the analysis of a two part experiment looking at bacterial communities in diatoms, under different temperatures. It is quite a complicated one, and I was hoping I could get some input on how to analyze it. First, however, let me explain how it was...
  13. R

    Bagging predictions with binary response variable in R

    I am trying to use the bagging technique to increase my model's predictive power. My response variable, status, is a binary variable where 0 indicates no disease and 1 indicates disease. The variable `status` is just a vector of repeating 0's and 1's (so its class is 'numeric' not 'factor'). Not...
  14. R

    How to view the resulting tree using the bagging function in R?

    I constructed a tree with the `rpart` function. Then I can plot it to look at the tree visually and also look at what % of the observations were classified correctly using `table(predict(...), ...)`. mytree=rpart(y~x1+x2+x3+x4, method="class") plot(mytree) text(mytree)...
  15. R

    rpart function: how to know the % of correct classification at every terminal node?

    I have a dataset with 277 observations.I have binary response variables i.e, 0 indicates no disease, and 1 indicates disease. I know that 180 of the observations have no disease and the 97 have the disease. I build a model and construct a classification tree to see how well my model correctly...
  16. A

    Classification for cloglog

    Howdy! I need to know how one can go about obtaining the overall classification, the sensitivity and specificity of a cloglog estimation in Stata. Is there an equivalent to that beautiful estat classification command that works so well in logits and probits? Many thanks!
  17. C

    Class Distribution, Boosted Models (gbm) Probability Scores

    All, Problem: I need help to better understand the probability scores that come from the result of a decision tree model. Specifically, I'm using the gbm package from R to create Generalized Boosted Regression Models, but the results I see are common across various ensemble classification...
  18. F

    Interpreting (negative) LDA classifier scores

    Hi, I performed an LDA in R using the lda() function. To my knowledge, this implements the LDA by Rao, 1948. From the results I derived the classification functions (not the discriminant functions) for each class of the model. My data is pretty fuzzy and I'd rather perform a fuzzy than a...
  19. D

    Classification with high positive predictive value

    I have a two-class classification problem. I would like to train a multivariate classifier with 100% positive predictive value. In other words, I want the model to completely avoid one of the classes. For this application a low-ish sensitivity is OK as long as PPV is ~100%. Do you have any...
  20. V

    Deriving Formula from Ordinal Regression Results to Classify New Cases?

    What is the correct method for deriving a formula from the results of an Ordinal Regression, that can be used to predict the value of the dependent variable for new cases? Thanks very much in advance to all for any info. Best, -Vik