# How to test likelihood hypothesis on dataset?

#### IIIIOOOO

##### New Member
How to test the following hypothesis? Customers with larger fares are more likely to be travailing alone than smaller ones.
Using the data below, is a common statistical test appropriate for this hypothesis?

Given were comparing different segments(high vs low) of fare, surely tests using means aren't helpful? Thus is there an alternative to statistical test for validating this hypothesis.

Common Statistical tests
> T-test: compare two groups/categories of numeric variables with small sample size​
1. one sample t-test: test the mean of one group against a constant value​
2. two sample t-test: test the difference of means between two groups​
3. paired sample t-test: test the difference of means between two measurements of the same subject​
> Z-test: compare two groups/categories of numeric variables with large sample size​
> ANOVA test: compare the difference between two or more groups/categories of numeric variables​
> Chi-Squared test: examine the relationship between two categorical variables​
> Correlation test: examine the relationship between two numeric variable​

Code:
# package
import seaborn as sns

df = df[['fare','alone']]

#dataset
fare    alone

0   7.2500  False

1   71.2833 False

2   7.9250  True

3   53.1000 False

4   8.0500  True

#### Karabiner

##### TS Contributor
How to test the following hypothesis? Customers with larger fares are more likely to be travailing alone than smaller ones.
(...)
Given were comparing different segments(high vs low) of fare,
That would be sad, because it would waste statistical information and lead to silly groupings (e.g.
why is someone with a medium fare grouped together with someone who has an extremely high fare,
but is in a different group than someone just a little bit below the middle?). If you were ordered to
categorize that way, then you have a 2 categorial variables with 2 levels each, i.e. a 2x2 crosstabulation,
and can use Chi² test or Fisher's exact test.

If you use fare just as it is, i.e. uncategorized, then you can compare it between those travelling alone
vs. those not travelling alone using a two-sample test for independent groups (t-Test or U-test).

With kind regards

Karabiner