1
Preface
2
Introduction into R
2.1
Start R-Studio
2.2
Using R
2.2.1
Start of the system:
2.2.2
Simplest way to use: R as calculator
2.2.3
Multiple commands are separated by ;
2.2.4
Using functions:
2.3
Getting help
2.4
Assignment of data to variables
2.4.1
Arrow or equal sign?
2.5
Working with variables
2.6
Using variables
2.7
Data types in R Variables
2.7.1
Scalar
2.7.2
Vector
2.7.3
Matrix:
2.7.4
Data frame:
2.7.5
The working directory
2.7.6
Download data for further tasks into your working directory
2.8
Data import through reading of files
2.9
Using c() for data entry
2.10
Named vectors
2.11
Applying functions to more complex variables
2.12
Calculations with vectors
2.13
Sequences and repeated data
2.14
Data access
2.14.1
by index/position
2.15
Logical values
2.16
Factors
2.17
missing (NA) values
2.18
Matrices
2.19
Data frames
2.20
Data export through save
3
Explorative statistics & graphical display
3.1
Dataset used for this chapter
3.2
Cross tables (contingency tables)
3.3
Basics about charts
3.4
The command
plot()
3.4.1
Basic plotting
3.4.2
Enhancing the plot with optional components & Text
3.5
Export the graphics
3.6
Pie chart
3.7
Bar plot
3.8
Box-plot (Box-and-Whiskers-Plot)
3.9
Scatterplot
3.10
Histogramm
3.11
stem-and-leaf chart
3.12
kernel smoothing (kernel density estimation)
3.13
Guidelines
3.13.1
Stay honest!
4
Descriptive Statistics
4.1
Introduction
4.1.1
Parameters of distributions
4.1.2
Loading data for the following steps
4.2
Central tendency
4.2.1
Mean
4.2.2
Median
4.2.3
Mode
4.2.4
Comparing Central tendency parameters
4.3
Dispersion
4.3.1
Range
4.3.2
Towards a better parameter for dispersion
4.3.3
(empirical) variance
4.3.4
(empirical) standard deviation
4.3.5
Coefficient of variation
4.3.6
Quantile
4.4
Shape of the distribution
4.4.1
Skewness
4.4.2
Kurtosis
4.5
Take Home
5
Nonparametric Tests
5.1
Inductive statistics or statistical inference
5.2
Population and sample
5.2.1
Repetition:
5.2.2
Parameter
5.3
Statistical Hypothesis testing
5.3.1
Validation of an assumption about the population
5.3.2
Null hypothesis
5.3.3
One-tailed/Two-tailed hypothesis
5.3.4
Stat. Significance
5.3.5
How true is true?
5.4
α- und β-error
5.5
Parametric vs. Nonparametric
5.6
\(\chi^2\)
test
5.6.1
Facts sheet
5.6.2
Excursus degree of freedom
5.6.3
\(\chi^2\)
Test for one sample (example after Shennan)
5.6.4
Two sample case (Test for independence)
5.7
Kolmogorov–Smirnov test
5.7.1
Facts sheet KS-Test
5.7.2
Example (after Shennan)
5.7.3
KS-Test in R
5.8
Interpretation of significance tests
5.8.1
Pay attention also when the statistic seem to be clear
5.8.2
Statistical association not mean causal association!
6
Basic Probability Theory
6.1
Repetition
6.1.1
Population and sample
6.1.2
Null hypothesis
6.2
The concept of probability
6.2.1
Some Notations and Definitions
6.2.2
Classical Probability Definition by Laplace
6.2.3
Kolmogorovs probability axioms
6.2.4
Conditional and independent events
6.2.5
Addition law of probability
6.3
Combinatorics
6.3.1
How many possibilities are there to combine 2 dice results?
6.3.2
How many possible unique Lotto tickets (6 of 42) are there?
6.4
Law of large numbers
6.5
Random variables
6.5.1
What is random at all?
6.5.2
Example for recoding (after Dolić)
6.6
Building a statistical test from scratch
6.6.1
Number of possible events
6.6.2
Number of positive events
6.6.3
From counts to probabilities
6.7
The Binomial Distribution
6.7.1
The binom.test
6.7.2
Sample size
7
Parametric tests
7.1
Introduction
7.2
The parametric in parametric tests
7.3
Test for normal distribution
7.3.1
The good old KS-Test
7.3.2
The Shapiro Wilk test
7.3.3
visual: QQ-plot
7.4
F-Test
7.4.1
Factsheet F Test
7.4.2
Calculation by hand
7.4.3
F-Test in R
7.5
t-Test
7.5.1
Facts sheet t-Test (homogenuous variances)
7.5.2
Calculation ‘by hand’
7.5.3
t-test in R
7.5.4
Welch-Test (t-Test for inhomogenous variances with welch correction)
7.6
Multiple Tests
7.6.1
The Problem
7.6.2
p value correction
7.7
ANOVA
7.7.1
Facts sheet ANOVA
7.7.2
Doing Anova by hand
7.7.3
ANOVA in R, more advanced topics
7.8
Summary
8
Regression Analysis and Correlation
8.1
What is a regression? Some Terms and Definitions
8.2
Calculating the regression
8.2.1
Linear Equation
8.2.2
Least-squares method (Methode der kleinsten Quadrate)
8.3
Correlation
8.3.1
Correlation coefficient in Theory
8.4
Correlation test
8.5
Correlation of ordinally scaled variables
8.5.1
Kendall’s
\(\tau\)
(tau) Fact Sheet
8.5.2
Kendall’s
\(\tau\)
(tau) in pratice
8.6
Correlation does not imply causation
9
Cluster Analysis
9.1
Idea and Basics
9.2
Methods
9.2.1
Hierarchical approach
9.2.2
Partitioning approach
9.2.3
Hierarchical and partitioning: advantages and disadvantages
9.3
Distance calculations
9.3.1
How the crow flies (Euclidean distance (metric variables))
9.3.2
How the taxy drives (Manhattan distance)
9.3.3
When distances can no longer be calculated (non-metric variables, presence/absence matrices)
9.4
Distance calculation in R
9.5
Hierarchical clustering
9.5.1
Cluster analysis by hand
9.5.2
Dendrogram
9.5.3
Other methods
9.5.4
Praktische Durchführung in R
9.5.5
With metric data (Distance matrix)
9.5.6
with nominal data (Dissimilarity matrix)
9.6
Number of Clusters
9.6.1
Tree care and visualisation
9.7
Non-hierarchical clustering
9.8
Schluss und Überleitung
Statistical methods for archaeological data analysis
Statistical methods for archaeological data analysis
Martin Hinz
1
Preface
Hallo Welt!