I understand

This MVP says that the old function and the new .Inc function use a "slightly less accurate algorithm"...

Help understanding Excel's PercentRank.Inc vs .Exc]]>

I have created a physiological score (continuous variable, values between 0 and 1) that correlates to a disease (binary variable, 0 = healthy, 1 = patient). The idea is to use this score to predict the disease (since it is measured relatively easily).

The dataset that I have available, however, is unbalanced with regards to age (healthy people are in average younger than patients). We also know that age plays...

Unbalanced confounding variable]]>

Say the annual turnover rate or employees leaving the company is 22%.

How do I determine the probability of someone who has been employed by the company leaving after 2 years, 3 years, 4 years, 5 years etc.

Is there a way to derive the probability?

Assistance required]]>

It's been years since the last time I've had to do this. I thought it was simple, but I haven't been able to figure this one out...

.... how do I switch from long to wide format in R?

Like, say I have this:

Code:

```
x <- rnorm(100)
g <- rep(1:10, each=10)
df <- data.frame(x,g)
```

Code:

`> reshape(data=df, timevar="g", direction="wide")...`

I'm really having a tough time with this exercise and I don't even know where I am supposed to start from! Could you help me?

There were 1000 citizens in a small town. Because of a disease, 811 of them got vaccinated. However, 200 citizens died and 110 of them were vaccinated.

(a) Test if the vaccine prevents death significantly

(b) Test if the vaccine increases the probability of surviving by more than 30 percentage points.

thank you!]]>

The situation appears when you test a measurement instrument by measuring on a certified reference sample with a given nominal value and measurement uncertainty (=standard deviation).

In my case I have made 30 repeated measurements on a certified reference sample , so my own sample size is known.

I have calculated the average and standard deviation of my sample of 30 measurements...

Hypotheses testing a sample vs an unknown sample size]]>

I would first like to check that the tests I plan to run are correct? I am also not sure on the intraclass correlation...

Interpreting intraclass correlation]]>

Many thanks in advance for any advice.]]>

I am currently working with paneldata to see if there is correlation between sustainability and performancce in the energy and materials sector of the S&P500.

I ran the regression twice, one with the logarithm of MarketValue (=MarketValue.WINS.LOG) and one without (=MarketValue.WINS).

The outcome is different and I really wonder if I should go for the log MarketValue or not?

The regression with Log MArketValue has a 0.1 -0.2 higher R^2, but the other variables are not...

R^2 vs. significance of the the variables]]>

First post on here, so hope it's not too incoherent.

Most sources I've looked at in trying to learn about statistics and probability talk about Type 1 errors as cases where we falsely attribute significance and state that the chance of this is the alpha value we decide when determining the significance of a test (e.g. 5% for a 95% confidence interval). It might sound bizarre, but I actually struggle with this.

At one level I get it - if we are designing a test in advance and we...

Why is chance of a Type 1 error equal to alpha?]]>

But what if those inputs were free to be any value across the range 0..1? For the sake of example let's say A=2/3 and B=1/4

I can surmise that NOT(A) = 1 - 2/3 = 1/3, and also that NOT(B) = 1 - 1/4 = 3/4

But what of AND and OR?]]>

I am a bit stumped with the research I am conducting and wanted to ask for some advice/insight.

My study involved having 4 individuals complete the same 3 measures at pre- and post-test. So each individual completed each measure before receiving an intervention, and then completed the same measures again after receiving the intervention. Ordinarily, this would be a t-test looking at pre/post differences across individuals. However, due to the small sample size, the normality of my...

Wilcoxon Signed Rank Test for Small Sample Size]]>

I'm struggling with data analysis for cells categories. Here's the context :

I have 5 categories of cells : 2 in vitro, and 3 extracted from animal models at different time post-infection.

I have Early In Vitro / Late In vitro / 1 day post-infection / 2 days post-infection / 3 days post-infection (each sample is from a different animal, 1 animal gives only 1 value)

I performed a metabolic assay and I want to know if cells extracted at 3 days post-infection are...

Multiple comparisons analysis]]>

Sir:

I am currently teaching an introduction course in statistics, and a student asked me a question that confuses me. Let’s say we wish to compare the mean income of residents of two cities. We sample randomly residents in City A while we sample systematically in City B. Can we compare reliably the means of these cities? I would say yes as long as the samples represent well each of their respective cities. Do you agree with this answer?

Best regards,

Yves Claveau

Using two different sampling methods when comparing means. A no go?]]>

Confused about what type of ANOVA is required]]>

As I am quit new in R and not such a pro in statistics, so sorry if my question is very naive.

I would like to do a meta-analysis with several studies for which I have the raw data. The research question was the same but the design of the studies was different. To take these differences into account, I performed a GLMM for each study to extract the LSmeans and SEs.

I then wanted to perform a meta-analysis with the meta package in R using the LSmeans and SEs. However, with the...

Meta-analysis with metacont (R) LSmeans + SDs or SEs ?]]>

I'd like some guidance, since I don't know exactly what statistic would be more adequate in my case.

I have a dataset that includes several categories and only one relevant numerical variable, and each category is repeated and has several measurements. I have around 1500 data points that correspond to ~20 categories. (e.g. Imagine we have a list of people heights in centimetres and the category is the country of that person)

I would like to find a way to group these categories in 5...

Clustering univariate measures]]>

I tried using the "

I am confused with two things it would be great if anyone could help.

1: Does the character type in a video game influence the duration of gaming (in minutes)? Each participant played a video game four times with different characters. Those characters were warrior, healer, bard, and scout. and each participant’s gaming duration was measured in minutes under four different gaming characters.

About the Friedman test, DV should be ordinal. if interval or ratio, should be...

Non-parametric tests]]>

I am wondering if composite reliability is designed for checking reliability or validity ?]]>

I'd like to have a confirmation on the correctness of the following interpretation:

Let say that we want to run a very simple regression like the following one:

We are regressing two I(1) series since x and y are assumed to be both described by a random walk process. The errors of these 2 processes are uncorrelated

Granger and Newbold showed that in this case, we...

Spurious Regression with non stationary time-series]]>

This might be a basic question, but I have a dataset that measures items on 3 distinct measures (say time, speed, and distance). I’m attempting to find a combined ranking of these. Is it acceptable to take the quartile ranking of each measure and then find the average quartile ranking across the three measures?

In other words one item might have quartile rankings of 1 (time), 2 (speed), and 3 (distance). The average quartile ranking would then be 2. I’m just not sure if that’s...

Quartile average question]]>

This dataset, in which is studied the number of calories emitted during exercise on

Help! What analysis should be performed on a dataset like this?]]>

I'm new to statistics. I have some basic question on hypothesis and Type-I and II errors. I'm posting two scenarios on the same topic below. Need your help to get some knowledge on the 2nd scenario

Let us say

But due to government terms and conditions, I'm giving my blood sample for testing. Based on the test results, below is the type-I and type-II errors

Basic question of Hypothesis and Type I and II errors]]>

I am working on the following file. I want to categorize absenteeism at my company as age, seniority, diagnosis, machine group, and month. I will use these parameters and perform multiple regression analysis, but when I separate them as follows, I cannot meet the sufficient assumptions for regression. How can I add dummy variables to this file more effectively?

...

How are multi-categorical variables used in multiple regression analysis?]]>

In Life

Some mock me for doing statistics

Some loathe me and statistics

Some don’t understand what statistics are

Why is it that statistics

Put a calm smile on my face?

Because of statistics I can solve the deepest mysteries

Because of statistics I will not be lonely again, playing in the data

Because of statistics I can rearrange the stars in the skies above

(by Chinese statistician Wang Jiaowei [translated],

The...

Statistics Poetry]]>

"Given the company’s performance record and based on the empirical rule of normal distribution (also known as the 68%-95%-99.7% rule), what would be the

Here is the key data from the sheet, which have all be confirmed as correct...

Std Dev Bounds]]>

I have a data with n = 100 000 rows and p = 2 variables X and Y.

There is a trend between these two variables however it is very blurry and we don't see anything (too many points).

My strategy is to use a clustering algorithm (K-Means for example) on the 100 000 rows and to classify them into 1000 clusters (the purpose is to catch the dispersion of the whole data). As you know, I can calculate the "center" of each cluster.

After, I only plot the 1000 centers on a graph...

Reduce the dimension (rows) with clustering]]>

I am writing my Master Thesis on the concept of 4-day work week...this is about how employees instead of working 5 days a week...work only for 4 days a week...since the concept is relatively new..mine is an exploratory study...

In this study there are three parts -

Part 1 has its own separate research question...

What are the types of statistical tests/analysis which I can do here ?]]>

I'm currently completing my dissertation for a masters in a health field (hopefully). Once done I'm interested in additional education in statistics. It's obviously a weak point of mine. My initial biostats course was years ago so looking for something to dig into to help further in the medical field. Any suggestions on graduate certificates or other similar training? Online probably best. I'm based in Canada if it matters. Of note, I'm not looking for a masters in stats or...

Education]]>

I am conducting a retrospective cohort analysis and will need to compare the demographic data between the cohorts to check if they are the same or if any categories significantly differ. For age I plan to use the mean of each group and utilize a t-test (or z-test).

In order to assess whether the groups (call them HEMS and GEMS) have significantly different amounts of males vs females what's the best way to go? Is there a simple way to compare the two groups, with a null...

Sorry for yet another beginner question...]]>

I am doing a Research Project for my Master degree course.

One of my Research Questions in my Project is:

How do employees in <”name of country”> perceive the concept of <”an upcoming concept related to how employees work”> ?

I need to take a survey in order to answer the above question. I have already prepared the questions but still not started the surveying part till now.

I was told by my professor that I need to do statistical analysis as well.

I have never...

What tests to carry out when all variables are categorical?]]>

The column of the variable wl (work level) contains values which go from 13 to 56, all observation are discrete EXCEPT for two observations which have continuous values (34.5 and 34.5). Given so, I choose to consider this...

Should I consider this variable continuous or discrete?]]>

my question is there any relationship between those measures??!!]]>

Fisher exact test =

Question: How significant is the fisher exact test result of 0.037?

The hypothesis would be: Medication A can lead to illness Z, because of p = 0.037?

Odds ratio would be 6.327, is that as well significant?

Thanks!]]>

do you have any clue on what kind of model this is and how can it be adapted in R?]]>

I'm trying to do a quick and rough calculation but have zero desire to pump more meaningless 'statistics' into the world. So wanted to get a sense check here, please.

I'm measuring something once a day, every day. I take a 7-day moving average up to and including yesterday. I take today's measure. I want to know if today's number is any indication that things are starting to change.

My immediate thought was merely to compare day 8's value with the SD of the 7 days and use the...

Moving Averages]]>

I am Abdiel and I am a tech enthusiast. I would like to introduce my site here which contains the latest and updated info about technology. Most of the articles of my website are related to Windows and these may help everyone to increase the power of knowledge in technology. Here is my website. If anyone has any interest to know anything or any query about my website, just go to the contact page of my website and can...

Greetings To All My Friends!]]>

ANOVA after PCA]]>

I have 956 respondents, and my X (indep. var) has 4 cases.

The

SPSS-PROCESS-HAYES Mediation-Model 84]]>

From what I can understand, the Annual Personal Outcome is the DV and there are several IVs (i.e gender, sexual orientation etc.). As such, has a multiple linear regression been conducted here?

I'm struggling to understand why some of the IVs are dichotomous (gender: male/female) and some are singular (bisexual).

Sex: Female (ref.: Male) -5,068.737***

Does this mean that females earn less...

Multiple Regression]]>

There are Y tests in total and you can either pass or fail

I have data on all of the X students but so far not every student has taken every test - some have taken all Y, some have taken y-1, some y-2, etc. down to 1.

I want to know how I’d calculate

1.) the probability that a future student passes all Y tests

2.) the expected total number of fails for a future student

To work it out I thought about just treating each exam as an...

Probability of test passes]]>

I'll start with an article about how we share our data:

http://gigaom.com/2014/01/18/you-dont-want-your-privacy-disney-and-the-meat-space-data-race/]]>