Chapter 28: R Data Structures
Data Structures. Think of data structures as different types of containers, each designed to hold and organize data in specific ways. Just as you wouldn’t store soup in a colander or carry groceries in a thimble, choosing the right data structure for your task is crucial for efficient and effective programming.
Part 1: Overview of R Data Structures
R has several built-in data structures, each with its own characteristics:
| Structure | Dimensions | Homogeneous | Can hold different types? |
|---|---|---|---|
| Vector | 1D | Yes | No |
| Matrix | 2D | Yes | No |
| Array | nD | Yes | No |
| List | 1D | No | Yes |
| Data Frame | 2D | No (column-wise) | Yes |
| Factor | 1D | Yes | Categorical only |
| Tibble | 2D | No | Yes (tidyverse) |
Let’s explore each one in detail with plenty of examples.
Part 2: Vectors – The Building Blocks
Vectors are the simplest and most fundamental data structure in R. They are 1-dimensional sequences of elements that must all be the same type.
Creating Vectors
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
<span class="token comment"># Numeric vector</span> numbers <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>numbers<span class="token punctuation">)</span> <span class="token comment"># Character vector</span> fruits <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token string">"apple"</span><span class="token punctuation">,</span> <span class="token string">"banana"</span><span class="token punctuation">,</span> <span class="token string">"orange"</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">)</span> <span class="token comment"># Logical vector</span> flags <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>flags<span class="token punctuation">)</span> <span class="token comment"># Sequence vectors</span> seq1 <span class="token operator"><-</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">10</span> <span class="token comment"># Integers from 1 to 10</span> seq2 <span class="token operator"><-</span> seq<span class="token punctuation">(</span>from <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">,</span> to <span class="token operator">=</span> <span class="token number">100</span><span class="token punctuation">,</span> by <span class="token operator">=</span> <span class="token number">10</span><span class="token punctuation">)</span> <span class="token comment"># 0,10,20,...,100</span> seq3 <span class="token operator"><-</span> rep<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> times <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">)</span> <span class="token comment"># Repeat 1,2,3 three times</span> seq4 <span class="token operator"><-</span> rep<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> each <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">)</span> <span class="token comment"># Each element repeated 3 times</span> print<span class="token punctuation">(</span>seq1<span class="token punctuation">)</span> print<span class="token punctuation">(</span>seq2<span class="token punctuation">)</span> print<span class="token punctuation">(</span>seq3<span class="token punctuation">)</span> print<span class="token punctuation">(</span>seq4<span class="token punctuation">)</span> |
Vector Operations
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
<span class="token comment"># Arithmetic operations (element-wise)</span> v1 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span> v2 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">6</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>v1 <span class="token operator">+</span> v2<span class="token punctuation">)</span> <span class="token comment"># 5 7 9</span> print<span class="token punctuation">(</span>v1 <span class="token operator">*</span> v2<span class="token punctuation">)</span> <span class="token comment"># 4 10 18</span> print<span class="token punctuation">(</span>v1<span class="token operator">^</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token comment"># 1 4 9</span> <span class="token comment"># Recycling (shorter vector is recycled)</span> v3 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>v1 <span class="token operator">+</span> v3<span class="token punctuation">)</span> <span class="token comment"># 2 4 4 (3+1? Actually 3+1=4 because v3 is recycled)</span> <span class="token comment"># Vector functions</span> v <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">20</span><span class="token punctuation">,</span> <span class="token number">30</span><span class="token punctuation">,</span> <span class="token number">40</span><span class="token punctuation">,</span> <span class="token number">50</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>length<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 5</span> print<span class="token punctuation">(</span>sum<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 150</span> print<span class="token punctuation">(</span>mean<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 30</span> print<span class="token punctuation">(</span>sd<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Standard deviation</span> print<span class="token punctuation">(</span>min<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 10</span> print<span class="token punctuation">(</span>max<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 50</span> print<span class="token punctuation">(</span>range<span class="token punctuation">(</span>v<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 10 50</span> |
Accessing Vector Elements
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
fruits <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token string">"apple"</span><span class="token punctuation">,</span> <span class="token string">"banana"</span><span class="token punctuation">,</span> <span class="token string">"orange"</span><span class="token punctuation">,</span> <span class="token string">"grape"</span><span class="token punctuation">,</span> <span class="token string">"mango"</span><span class="token punctuation">)</span> <span class="token comment"># By position (1-based indexing!)</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># "apple"</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># "orange"</span> <span class="token comment"># Multiple positions</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">[</span>c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># "apple" "orange" "mango"</span> <span class="token comment"># By logical vector</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">[</span>c<span class="token punctuation">(</span><span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># "apple" "orange" "mango"</span> <span class="token comment"># Negative indices (exclude)</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All except "banana"</span> print<span class="token punctuation">(</span>fruits<span class="token punctuation">[</span><span class="token operator">-</span>c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All except first and last</span> <span class="token comment"># Named vectors</span> named_vec <span class="token operator"><-</span> c<span class="token punctuation">(</span>name <span class="token operator">=</span> <span class="token string">"Alice"</span><span class="token punctuation">,</span> age <span class="token operator">=</span> <span class="token string">"30"</span><span class="token punctuation">,</span> city <span class="token operator">=</span> <span class="token string">"New York"</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>named_vec<span class="token punctuation">[</span><span class="token string">"name"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>named_vec<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"age"</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span> |
Vector Coercion
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<span class="token comment"># Mixing types causes coercion</span> mixed <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">"three"</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>mixed<span class="token punctuation">)</span> <span class="token comment"># All become characters: "1" "2" "three" "TRUE"</span> print<span class="token punctuation">(</span>typeof<span class="token punctuation">(</span>mixed<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Explicit coercion</span> num_vec <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span> char_vec <span class="token operator"><-</span> as.character<span class="token punctuation">(</span>num_vec<span class="token punctuation">)</span> logical_vec <span class="token operator"><-</span> as.logical<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span> |
Part 3: Matrices – 2D Homogeneous Data
Matrices are 2-dimensional extensions of vectors – they have rows and columns, but all elements must be the same type.
Creating Matrices
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
<span class="token comment"># From a vector</span> m1 <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">12</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span> ncol <span class="token operator">=</span> <span class="token number">4</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>m1<span class="token punctuation">)</span> <span class="token comment"># Fill by row instead of column (default is by column)</span> m2 <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">12</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span> ncol <span class="token operator">=</span> <span class="token number">4</span><span class="token punctuation">,</span> byrow <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>m2<span class="token punctuation">)</span> <span class="token comment"># Combining vectors as rows</span> row1 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span> row2 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">6</span><span class="token punctuation">)</span> row3 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">7</span><span class="token punctuation">,</span> <span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">9</span><span class="token punctuation">)</span> m3 <span class="token operator"><-</span> rbind<span class="token punctuation">(</span>row1<span class="token punctuation">,</span> row2<span class="token punctuation">,</span> row3<span class="token punctuation">)</span> <span class="token comment"># Row bind</span> print<span class="token punctuation">(</span>m3<span class="token punctuation">)</span> <span class="token comment"># Combining vectors as columns</span> col1 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">,</span> <span class="token number">7</span><span class="token punctuation">)</span> col2 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">8</span><span class="token punctuation">)</span> col3 <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">6</span><span class="token punctuation">,</span> <span class="token number">9</span><span class="token punctuation">)</span> m4 <span class="token operator"><-</span> cbind<span class="token punctuation">(</span>col1<span class="token punctuation">,</span> col2<span class="token punctuation">,</span> col3<span class="token punctuation">)</span> <span class="token comment"># Column bind</span> print<span class="token punctuation">(</span>m4<span class="token punctuation">)</span> <span class="token comment"># Named dimensions</span> m5 <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">9</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span> ncol <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span> dimnames <span class="token operator">=</span> list<span class="token punctuation">(</span> c<span class="token punctuation">(</span><span class="token string">"Row1"</span><span class="token punctuation">,</span> <span class="token string">"Row2"</span><span class="token punctuation">,</span> <span class="token string">"Row3"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"Col1"</span><span class="token punctuation">,</span> <span class="token string">"Col2"</span><span class="token punctuation">,</span> <span class="token string">"Col3"</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>m5<span class="token punctuation">)</span> |
Matrix Operations
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
m <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">9</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">)</span> <span class="token comment"># Basic properties</span> print<span class="token punctuation">(</span>dim<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 3 3</span> print<span class="token punctuation">(</span>nrow<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 3</span> print<span class="token punctuation">(</span>ncol<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 3</span> print<span class="token punctuation">(</span>length<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># 9 (total elements)</span> <span class="token comment"># Arithmetic</span> m <span class="token operator">*</span> <span class="token number">2</span> <span class="token comment"># Multiply every element by 2</span> m <span class="token operator">+</span> <span class="token number">10</span> <span class="token comment"># Add 10 to every element</span> <span class="token comment"># Matrix multiplication</span> m1 <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">4</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">)</span> m2 <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">5</span><span class="token operator">:</span><span class="token number">8</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>m1 <span class="token percent-operator operator">%*%</span> m2<span class="token punctuation">)</span> <span class="token comment"># Matrix multiplication</span> <span class="token comment"># Element-wise operations</span> m1 <span class="token operator">*</span> m2 <span class="token comment"># Element-wise multiplication</span> m1 <span class="token operator">+</span> m2 <span class="token comment"># Element-wise addition</span> <span class="token comment"># Transpose</span> print<span class="token punctuation">(</span>t<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Row and column sums</span> print<span class="token punctuation">(</span>rowSums<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>colSums<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>rowMeans<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>colMeans<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">)</span> |
Accessing Matrix Elements
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
m <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">12</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span> ncol <span class="token operator">=</span> <span class="token number">4</span><span class="token punctuation">,</span> dimnames <span class="token operator">=</span> list<span class="token punctuation">(</span> c<span class="token punctuation">(</span><span class="token string">"R1"</span><span class="token punctuation">,</span> <span class="token string">"R2"</span><span class="token punctuation">,</span> <span class="token string">"R3"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"C1"</span><span class="token punctuation">,</span> <span class="token string">"C2"</span><span class="token punctuation">,</span> <span class="token string">"C3"</span><span class="token punctuation">,</span> <span class="token string">"C4"</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>m<span class="token punctuation">)</span> <span class="token comment"># Single element</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Row 2, Column 3</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span><span class="token string">"R2"</span><span class="token punctuation">,</span> <span class="token string">"C3"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Using names</span> <span class="token comment"># Entire row</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All columns of row 2</span> <span class="token comment"># Entire column</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All rows of column 3</span> <span class="token comment"># Multiple rows and columns</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span>c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Rows 1&3, Columns 2&4</span> <span class="token comment"># By logical conditions</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span>m <span class="token operator">></span> <span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All elements > 5 (returns a vector)</span> print<span class="token punctuation">(</span>m<span class="token punctuation">[</span><span class="token punctuation">,</span> m<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token punctuation">]</span> <span class="token operator">></span> <span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Columns where first row > 3</span> |
Part 4: Arrays – Multi-dimensional Homogeneous Data
Arrays extend matrices to more than 2 dimensions.
Creating Arrays
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
<span class="token comment"># 3D array (2 rows, 3 columns, 4 layers)</span> arr <span class="token operator"><-</span> array<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">24</span><span class="token punctuation">,</span> dim <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>arr<span class="token punctuation">)</span> <span class="token comment"># With dimension names</span> dim_names <span class="token operator"><-</span> list<span class="token punctuation">(</span> c<span class="token punctuation">(</span><span class="token string">"Row1"</span><span class="token punctuation">,</span> <span class="token string">"Row2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"Col1"</span><span class="token punctuation">,</span> <span class="token string">"Col2"</span><span class="token punctuation">,</span> <span class="token string">"Col3"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"Layer1"</span><span class="token punctuation">,</span> <span class="token string">"Layer2"</span><span class="token punctuation">,</span> <span class="token string">"Layer3"</span><span class="token punctuation">,</span> <span class="token string">"Layer4"</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> arr_named <span class="token operator"><-</span> array<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">24</span><span class="token punctuation">,</span> dim <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">,</span> dimnames <span class="token operator">=</span> dim_names<span class="token punctuation">)</span> print<span class="token punctuation">(</span>arr_named<span class="token punctuation">)</span> <span class="token comment"># Accessing elements</span> print<span class="token punctuation">(</span>arr_named<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Row1, Col2, Layer3</span> print<span class="token punctuation">(</span>arr_named<span class="token punctuation">[</span><span class="token punctuation">,</span> <span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All rows and columns, first layer (a matrix)</span> print<span class="token punctuation">(</span>arr_named<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token punctuation">,</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># First row, all columns and layers</span> |
Practical Array Example
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
<span class="token comment"># Create temperature data: 3 cities, 4 seasons, 2 years</span> cities <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token string">"New York"</span><span class="token punctuation">,</span> <span class="token string">"London"</span><span class="token punctuation">,</span> <span class="token string">"Tokyo"</span><span class="token punctuation">)</span> seasons <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token string">"Spring"</span><span class="token punctuation">,</span> <span class="token string">"Summer"</span><span class="token punctuation">,</span> <span class="token string">"Fall"</span><span class="token punctuation">,</span> <span class="token string">"Winter"</span><span class="token punctuation">)</span> years <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">2023</span><span class="token punctuation">,</span> <span class="token number">2024</span><span class="token punctuation">)</span> <span class="token comment"># Random temperature data</span> set.seed<span class="token punctuation">(</span><span class="token number">123</span><span class="token punctuation">)</span> temps <span class="token operator"><-</span> array<span class="token punctuation">(</span> round<span class="token punctuation">(</span>runif<span class="token punctuation">(</span><span class="token number">3</span> <span class="token operator">*</span> <span class="token number">4</span> <span class="token operator">*</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">35</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">,</span> dim <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">,</span> dimnames <span class="token operator">=</span> list<span class="token punctuation">(</span>cities<span class="token punctuation">,</span> seasons<span class="token punctuation">,</span> years<span class="token punctuation">)</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span><span class="token string">"Temperature Data (City x Season x Year):"</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>temps<span class="token punctuation">)</span> <span class="token comment"># Average temperature by city across all seasons and years</span> city_avg <span class="token operator"><-</span> apply<span class="token punctuation">(</span>temps<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> mean<span class="token punctuation">)</span> print<span class="token punctuation">(</span><span class="token string">"Average temperature by city:"</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>city_avg<span class="token punctuation">)</span> <span class="token comment"># Average temperature by season across all cities and years</span> season_avg <span class="token operator"><-</span> apply<span class="token punctuation">(</span>temps<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> mean<span class="token punctuation">)</span> print<span class="token punctuation">(</span><span class="token string">"Average temperature by season:"</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>season_avg<span class="token punctuation">)</span> <span class="token comment"># Compare years</span> year_avg <span class="token operator"><-</span> apply<span class="token punctuation">(</span>temps<span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> mean<span class="token punctuation">)</span> print<span class="token punctuation">(</span><span class="token string">"Average temperature by year:"</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>year_avg<span class="token punctuation">)</span> |
Part 5: Lists – Heterogeneous Containers
Lists are the most flexible data structure in R – they can contain elements of different types and sizes.
Creating Lists
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
<span class="token comment"># Simple list</span> my_list <span class="token operator"><-</span> list<span class="token punctuation">(</span> name <span class="token operator">=</span> <span class="token string">"Alice"</span><span class="token punctuation">,</span> age <span class="token operator">=</span> <span class="token number">30</span><span class="token punctuation">,</span> scores <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">85</span><span class="token punctuation">,</span> <span class="token number">92</span><span class="token punctuation">,</span> <span class="token number">78</span><span class="token punctuation">)</span><span class="token punctuation">,</span> passed <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> address <span class="token operator">=</span> list<span class="token punctuation">(</span> street <span class="token operator">=</span> <span class="token string">"123 Main St"</span><span class="token punctuation">,</span> city <span class="token operator">=</span> <span class="token string">"Boston"</span><span class="token punctuation">,</span> zip <span class="token operator">=</span> <span class="token number">02108</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>my_list<span class="token punctuation">)</span> <span class="token comment"># List with different structures</span> mixed_list <span class="token operator"><-</span> list<span class="token punctuation">(</span> numbers <span class="token operator">=</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">,</span> matrix <span class="token operator">=</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">9</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> text <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"hello"</span><span class="token punctuation">,</span> <span class="token string">"world"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> func <span class="token operator">=</span> <span class="token keyword">function</span><span class="token punctuation">(</span>x<span class="token punctuation">)</span> x<span class="token operator">^</span><span class="token number">2</span><span class="token punctuation">,</span> nothing <span class="token operator">=</span> <span class="token keyword">NULL</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>str<span class="token punctuation">(</span>mixed_list<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># str() shows structure</span> |
Accessing List Elements
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
<span class="token comment"># Three ways to access:</span> <span class="token comment"># 1. $ operator - by name</span> print<span class="token punctuation">(</span>my_list<span class="token operator">$</span>name<span class="token punctuation">)</span> print<span class="token punctuation">(</span>my_list<span class="token operator">$</span>scores<span class="token punctuation">)</span> <span class="token comment"># 2. [[]] - returns the element itself (single bracket)</span> print<span class="token punctuation">(</span>my_list<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"age"</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>my_list<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"address"</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"city"</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># 3. [] - returns a sublist (not the element)</span> print<span class="token punctuation">(</span>my_list<span class="token punctuation">[</span><span class="token string">"name"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Returns list(name = "Alice")</span> print<span class="token punctuation">(</span>my_list<span class="token punctuation">[</span>c<span class="token punctuation">(</span><span class="token string">"name"</span><span class="token punctuation">,</span> <span class="token string">"age"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Returns list with two elements</span> <span class="token comment"># Difference between [ and [[</span> print<span class="token punctuation">(</span>class<span class="token punctuation">(</span>my_list<span class="token punctuation">[</span><span class="token string">"name"</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># "list"</span> print<span class="token punctuation">(</span>class<span class="token punctuation">(</span>my_list<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"name"</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># "character"</span> |
Manipulating Lists
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
<span class="token comment"># Adding elements</span> my_list<span class="token operator">$</span>new_element <span class="token operator"><-</span> <span class="token string">"added later"</span> my_list<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"another"</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator"><-</span> <span class="token number">42</span> <span class="token comment"># Removing elements</span> my_list<span class="token operator">$</span>age <span class="token operator"><-</span> <span class="token keyword">NULL</span> <span class="token comment"># Remove age</span> <span class="token comment"># Combining lists</span> list1 <span class="token operator"><-</span> list<span class="token punctuation">(</span>a <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">,</span> b <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">)</span> list2 <span class="token operator"><-</span> list<span class="token punctuation">(</span>c <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span> d <span class="token operator">=</span> <span class="token number">4</span><span class="token punctuation">)</span> combined <span class="token operator"><-</span> c<span class="token punctuation">(</span>list1<span class="token punctuation">,</span> list2<span class="token punctuation">)</span> print<span class="token punctuation">(</span>combined<span class="token punctuation">)</span> <span class="token comment"># Length of list</span> print<span class="token punctuation">(</span>length<span class="token punctuation">(</span>my_list<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Names of list elements</span> print<span class="token punctuation">(</span>names<span class="token punctuation">(</span>my_list<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Convert to vector (if possible)</span> list_numbers <span class="token operator"><-</span> list<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span> vec_numbers <span class="token operator"><-</span> unlist<span class="token punctuation">(</span>list_numbers<span class="token punctuation">)</span> print<span class="token punctuation">(</span>vec_numbers<span class="token punctuation">)</span> <span class="token comment"># Now a vector</span> |
Practical List Example – Model Results
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
<span class="token comment"># Simulate storing multiple model results</span> run_model <span class="token operator"><-</span> <span class="token keyword">function</span><span class="token punctuation">(</span>model_name<span class="token punctuation">,</span> data<span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token comment"># Simulate different model outputs</span> list<span class="token punctuation">(</span> name <span class="token operator">=</span> model_name<span class="token punctuation">,</span> coefficients <span class="token operator">=</span> runif<span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">,</span> r_squared <span class="token operator">=</span> runif<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0.7</span><span class="token punctuation">,</span> <span class="token number">0.99</span><span class="token punctuation">)</span><span class="token punctuation">,</span> predictions <span class="token operator">=</span> data <span class="token operator">+</span> rnorm<span class="token punctuation">(</span>length<span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">,</span> residuals <span class="token operator">=</span> rnorm<span class="token punctuation">(</span>length<span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> convergence <span class="token operator">=</span> sample<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> prob <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">0.9</span><span class="token punctuation">,</span> <span class="token number">0.1</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">}</span> <span class="token comment"># Run several models</span> data <span class="token operator"><-</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">10</span> models <span class="token operator"><-</span> list<span class="token punctuation">(</span> lm1 <span class="token operator">=</span> run_model<span class="token punctuation">(</span><span class="token string">"Linear Model"</span><span class="token punctuation">,</span> data<span class="token punctuation">)</span><span class="token punctuation">,</span> lm2 <span class="token operator">=</span> run_model<span class="token punctuation">(</span><span class="token string">"Robust Model"</span><span class="token punctuation">,</span> data<span class="token punctuation">)</span><span class="token punctuation">,</span> lm3 <span class="token operator">=</span> run_model<span class="token punctuation">(</span><span class="token string">"Bayesian Model"</span><span class="token punctuation">,</span> data<span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token comment"># Analyze results</span> <span class="token keyword">for</span> <span class="token punctuation">(</span>model_name <span class="token keyword">in</span> names<span class="token punctuation">(</span>models<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> model <span class="token operator"><-</span> models<span class="token punctuation">[</span><span class="token punctuation">[</span>model_name<span class="token punctuation">]</span><span class="token punctuation">]</span> cat<span class="token punctuation">(</span><span class="token string">"\n"</span><span class="token punctuation">,</span> model_name<span class="token punctuation">,</span> <span class="token string">":\n"</span><span class="token punctuation">)</span> cat<span class="token punctuation">(</span><span class="token string">" R-squared:"</span><span class="token punctuation">,</span> round<span class="token punctuation">(</span>model<span class="token operator">$</span>r_squared<span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">"\n"</span><span class="token punctuation">)</span> cat<span class="token punctuation">(</span><span class="token string">" Converged:"</span><span class="token punctuation">,</span> model<span class="token operator">$</span>convergence<span class="token punctuation">,</span> <span class="token string">"\n"</span><span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>model<span class="token operator">$</span>convergence<span class="token punctuation">)</span> <span class="token punctuation">{</span> cat<span class="token punctuation">(</span><span class="token string">" Coefs:"</span><span class="token punctuation">,</span> paste<span class="token punctuation">(</span>round<span class="token punctuation">(</span>model<span class="token operator">$</span>coefficients<span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> collapse<span class="token operator">=</span><span class="token string">", "</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">"\n"</span><span class="token punctuation">)</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> |
Part 6: Data Frames – The Workhorse
Data frames are 2-dimensional structures where each column can be a different type. They’re the most common data structure for data analysis.
Creating Data Frames
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
<span class="token comment"># Create from vectors</span> df <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> name <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Alice"</span><span class="token punctuation">,</span> <span class="token string">"Bob"</span><span class="token punctuation">,</span> <span class="token string">"Charlie"</span><span class="token punctuation">,</span> <span class="token string">"Diana"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> age <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">25</span><span class="token punctuation">,</span> <span class="token number">30</span><span class="token punctuation">,</span> <span class="token number">35</span><span class="token punctuation">,</span> <span class="token number">28</span><span class="token punctuation">)</span><span class="token punctuation">,</span> height <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">165</span><span class="token punctuation">,</span> <span class="token number">180</span><span class="token punctuation">,</span> <span class="token number">175</span><span class="token punctuation">,</span> <span class="token number">170</span><span class="token punctuation">)</span><span class="token punctuation">,</span> student <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">)</span><span class="token punctuation">,</span> grade <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"A"</span><span class="token punctuation">,</span> <span class="token string">"B"</span><span class="token punctuation">,</span> <span class="token string">"A-"</span><span class="token punctuation">,</span> <span class="token string">"B+"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> stringsAsFactors <span class="token operator">=</span> <span class="token boolean">FALSE</span> <span class="token comment"># Don't convert strings to factors</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Check structure</span> str<span class="token punctuation">(</span>df<span class="token punctuation">)</span> summary<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Create with row names</span> df_named <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> row.names <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"ID1"</span><span class="token punctuation">,</span> <span class="token string">"ID2"</span><span class="token punctuation">,</span> <span class="token string">"ID3"</span><span class="token punctuation">,</span> <span class="token string">"ID4"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> age <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">25</span><span class="token punctuation">,</span> <span class="token number">30</span><span class="token punctuation">,</span> <span class="token number">35</span><span class="token punctuation">,</span> <span class="token number">28</span><span class="token punctuation">)</span><span class="token punctuation">,</span> score <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">85</span><span class="token punctuation">,</span> <span class="token number">92</span><span class="token punctuation">,</span> <span class="token number">78</span><span class="token punctuation">,</span> <span class="token number">88</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>df_named<span class="token punctuation">)</span> |
Accessing Data Frame Elements
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
<span class="token comment"># Multiple ways to access</span> <span class="token comment"># By column name (returns vector)</span> print<span class="token punctuation">(</span>df<span class="token operator">$</span>name<span class="token punctuation">)</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"age"</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># By column name (returns data frame)</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token string">"name"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span>c<span class="token punctuation">(</span><span class="token string">"name"</span><span class="token punctuation">,</span> <span class="token string">"age"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># By row and column indices</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Row 2, Column 3</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Row 2, all columns</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># All rows, Column 3</span> <span class="token comment"># By condition</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span>df<span class="token operator">$</span>age <span class="token operator">></span> <span class="token number">28</span><span class="token punctuation">,</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Rows where age > 28</span> print<span class="token punctuation">(</span>df<span class="token punctuation">[</span>df<span class="token operator">$</span>student <span class="token operator">==</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"name"</span><span class="token punctuation">,</span> <span class="token string">"grade"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Students' names and grades</span> <span class="token comment"># Using subset function</span> print<span class="token punctuation">(</span>subset<span class="token punctuation">(</span>df<span class="token punctuation">,</span> age <span class="token operator">></span> <span class="token number">28</span> <span class="token operator">&</span> student <span class="token operator">==</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span><span class="token punctuation">)</span> |
Manipulating Data Frames
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
<span class="token comment"># Adding columns</span> df<span class="token operator">$</span>gender <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token string">"F"</span><span class="token punctuation">,</span> <span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"F"</span><span class="token punctuation">)</span> df<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"test_score"</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">85</span><span class="token punctuation">,</span> <span class="token number">92</span><span class="token punctuation">,</span> <span class="token number">78</span><span class="token punctuation">,</span> <span class="token number">88</span><span class="token punctuation">)</span> <span class="token comment"># Adding a calculated column</span> df<span class="token operator">$</span>bmi <span class="token operator"><-</span> df<span class="token operator">$</span>height <span class="token operator">/</span> <span class="token number">100</span> <span class="token comment"># Simplified - not real BMI</span> <span class="token comment"># Adding rows</span> new_row <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> name <span class="token operator">=</span> <span class="token string">"Eve"</span><span class="token punctuation">,</span> age <span class="token operator">=</span> <span class="token number">32</span><span class="token punctuation">,</span> height <span class="token operator">=</span> <span class="token number">168</span><span class="token punctuation">,</span> student <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> grade <span class="token operator">=</span> <span class="token string">"A"</span><span class="token punctuation">,</span> gender <span class="token operator">=</span> <span class="token string">"F"</span><span class="token punctuation">,</span> test_score <span class="token operator">=</span> <span class="token number">95</span><span class="token punctuation">,</span> bmi <span class="token operator">=</span> <span class="token number">1.68</span> <span class="token punctuation">)</span> df <span class="token operator"><-</span> rbind<span class="token punctuation">(</span>df<span class="token punctuation">,</span> new_row<span class="token punctuation">)</span> print<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Removing columns</span> df<span class="token operator">$</span>bmi <span class="token operator"><-</span> <span class="token keyword">NULL</span> <span class="token comment"># Remove column</span> <span class="token comment"># Reordering columns</span> df <span class="token operator"><-</span> df<span class="token punctuation">[</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"name"</span><span class="token punctuation">,</span> <span class="token string">"age"</span><span class="token punctuation">,</span> <span class="token string">"gender"</span><span class="token punctuation">,</span> <span class="token string">"height"</span><span class="token punctuation">,</span> <span class="token string">"student"</span><span class="token punctuation">,</span> <span class="token string">"grade"</span><span class="token punctuation">,</span> <span class="token string">"test_score"</span><span class="token punctuation">)</span><span class="token punctuation">]</span> |
Common Data Frame Operations
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
<span class="token comment"># Basic info</span> nrow<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Number of rows</span> ncol<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Number of columns</span> dim<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Both dimensions</span> names<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Column names</span> head<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># First 6 rows</span> tail<span class="token punctuation">(</span>df<span class="token punctuation">)</span> <span class="token comment"># Last 6 rows</span> <span class="token comment"># Sorting</span> df_sorted <span class="token operator"><-</span> df<span class="token punctuation">[</span>order<span class="token punctuation">(</span>df<span class="token operator">$</span>age<span class="token punctuation">,</span> decreasing <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">]</span> print<span class="token punctuation">(</span>df_sorted<span class="token punctuation">)</span> <span class="token comment"># Multiple sort criteria</span> df_sorted2 <span class="token operator"><-</span> df<span class="token punctuation">[</span>order<span class="token punctuation">(</span>df<span class="token operator">$</span>student<span class="token punctuation">,</span> <span class="token operator">-</span>df<span class="token operator">$</span>age<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">]</span> print<span class="token punctuation">(</span>df_sorted2<span class="token punctuation">)</span> <span class="token comment"># Aggregating</span> aggregate<span class="token punctuation">(</span>test_score <span class="token operator">~</span> student<span class="token punctuation">,</span> data <span class="token operator">=</span> df<span class="token punctuation">,</span> FUN <span class="token operator">=</span> mean<span class="token punctuation">)</span> aggregate<span class="token punctuation">(</span>cbind<span class="token punctuation">(</span>age<span class="token punctuation">,</span> test_score<span class="token punctuation">)</span> <span class="token operator">~</span> gender<span class="token punctuation">,</span> data <span class="token operator">=</span> df<span class="token punctuation">,</span> FUN <span class="token operator">=</span> mean<span class="token punctuation">)</span> <span class="token comment"># Merging data frames</span> df2 <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> name <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Alice"</span><span class="token punctuation">,</span> <span class="token string">"Bob"</span><span class="token punctuation">,</span> <span class="token string">"Charlie"</span><span class="token punctuation">,</span> <span class="token string">"Frank"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> department <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Sales"</span><span class="token punctuation">,</span> <span class="token string">"IT"</span><span class="token punctuation">,</span> <span class="token string">"HR"</span><span class="token punctuation">,</span> <span class="token string">"IT"</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> merged <span class="token operator"><-</span> merge<span class="token punctuation">(</span>df<span class="token punctuation">,</span> df2<span class="token punctuation">,</span> by <span class="token operator">=</span> <span class="token string">"name"</span><span class="token punctuation">,</span> all.x <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>merged<span class="token punctuation">)</span> |
Part 7: Factors – For Categorical Data
Factors are designed to store categorical data efficiently. They store both the values and their possible levels.
Creating Factors
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
<span class="token comment"># Create a factor</span> colors <span class="token operator"><-</span> factor<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"red"</span><span class="token punctuation">,</span> <span class="token string">"blue"</span><span class="token punctuation">,</span> <span class="token string">"green"</span><span class="token punctuation">,</span> <span class="token string">"red"</span><span class="token punctuation">,</span> <span class="token string">"blue"</span><span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>colors<span class="token punctuation">)</span> print<span class="token punctuation">(</span>levels<span class="token punctuation">(</span>colors<span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>nlevels<span class="token punctuation">(</span>colors<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># With specified levels</span> sizes <span class="token operator"><-</span> factor<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"L"</span><span class="token punctuation">,</span> <span class="token string">"S"</span><span class="token punctuation">,</span> <span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"XL"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> levels <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"XS"</span><span class="token punctuation">,</span> <span class="token string">"S"</span><span class="token punctuation">,</span> <span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"L"</span><span class="token punctuation">,</span> <span class="token string">"XL"</span><span class="token punctuation">)</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>sizes<span class="token punctuation">)</span> <span class="token comment"># Ordered factors</span> ratings <span class="token operator"><-</span> factor<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"Good"</span><span class="token punctuation">,</span> <span class="token string">"Bad"</span><span class="token punctuation">,</span> <span class="token string">"Excellent"</span><span class="token punctuation">,</span> <span class="token string">"Good"</span><span class="token punctuation">,</span> <span class="token string">"Poor"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> levels <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Bad"</span><span class="token punctuation">,</span> <span class="token string">"Poor"</span><span class="token punctuation">,</span> <span class="token string">"Good"</span><span class="token punctuation">,</span> <span class="token string">"Excellent"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ordered <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>ratings<span class="token punctuation">)</span> print<span class="token punctuation">(</span>ratings<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">></span> ratings<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token comment"># Can compare ordered factors</span> |
Working with Factors
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
<span class="token comment"># Converting to/from factors</span> colors_char <span class="token operator"><-</span> as.character<span class="token punctuation">(</span>colors<span class="token punctuation">)</span> colors_factor <span class="token operator"><-</span> as.factor<span class="token punctuation">(</span>colors_char<span class="token punctuation">)</span> <span class="token comment"># Tabulating</span> table<span class="token punctuation">(</span>colors<span class="token punctuation">)</span> table<span class="token punctuation">(</span>sizes<span class="token punctuation">)</span> <span class="token comment"># In data frames</span> df_factor <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> name <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Alice"</span><span class="token punctuation">,</span> <span class="token string">"Bob"</span><span class="token punctuation">,</span> <span class="token string">"Charlie"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> gender <span class="token operator">=</span> factor<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"F"</span><span class="token punctuation">,</span> <span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"M"</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> education <span class="token operator">=</span> factor<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"College"</span><span class="token punctuation">,</span> <span class="token string">"High School"</span><span class="token punctuation">,</span> <span class="token string">"College"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> levels <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"High School"</span><span class="token punctuation">,</span> <span class="token string">"College"</span><span class="token punctuation">,</span> <span class="token string">"Graduate"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ordered <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>df_factor<span class="token punctuation">)</span> str<span class="token punctuation">(</span>df_factor<span class="token punctuation">)</span> <span class="token comment"># Adding new levels (can't add directly)</span> <span class="token comment"># df_factor$gender[4] <- "X" # Error!</span> <span class="token comment"># First add level</span> levels<span class="token punctuation">(</span>df_factor<span class="token operator">$</span>gender<span class="token punctuation">)</span> <span class="token operator"><-</span> c<span class="token punctuation">(</span>levels<span class="token punctuation">(</span>df_factor<span class="token operator">$</span>gender<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">"X"</span><span class="token punctuation">)</span> df_factor<span class="token operator">$</span>gender<span class="token punctuation">[</span><span class="token number">4</span><span class="token punctuation">]</span> <span class="token operator"><-</span> <span class="token string">"X"</span> <span class="token comment"># Now works</span> print<span class="token punctuation">(</span>df_factor<span class="token punctuation">)</span> |
Part 8: Tibbles – Modern Data Frames
Tibbles are a modern reimagining of data frames from the tidyverse package. They have nicer printing and stricter subsetting.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
<span class="token comment"># Install if needed</span> <span class="token comment"># install.packages("tibble")</span> library<span class="token punctuation">(</span>tibble<span class="token punctuation">)</span> <span class="token comment"># Create a tibble</span> tbl <span class="token operator"><-</span> tibble<span class="token punctuation">(</span> name <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Alice"</span><span class="token punctuation">,</span> <span class="token string">"Bob"</span><span class="token punctuation">,</span> <span class="token string">"Charlie"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> age <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">25</span><span class="token punctuation">,</span> <span class="token number">30</span><span class="token punctuation">,</span> <span class="token number">35</span><span class="token punctuation">)</span><span class="token punctuation">,</span> score <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">85</span><span class="token punctuation">,</span> <span class="token number">92</span><span class="token punctuation">,</span> <span class="token number">78</span><span class="token punctuation">)</span><span class="token punctuation">,</span> passed <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> <span class="token boolean">FALSE</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>tbl<span class="token punctuation">)</span> <span class="token comment"># Much nicer printing than data.frame</span> <span class="token comment"># Tibble advantages</span> <span class="token comment"># 1. Never changes strings to factors</span> str<span class="token punctuation">(</span>tbl<span class="token punctuation">)</span> <span class="token comment"># 2. Can have list columns</span> tbl_complex <span class="token operator"><-</span> tibble<span class="token punctuation">(</span> name <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Alice"</span><span class="token punctuation">,</span> <span class="token string">"Bob"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> data <span class="token operator">=</span> list<span class="token punctuation">(</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">4</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> print<span class="token punctuation">(</span>tbl_complex<span class="token punctuation">)</span> <span class="token comment"># 3. Better subsetting</span> tbl<span class="token operator">$</span>age tbl<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">"age"</span><span class="token punctuation">]</span><span class="token punctuation">]</span> tbl<span class="token punctuation">[</span><span class="token punctuation">,</span> <span class="token string">"age"</span><span class="token punctuation">]</span> <span class="token comment"># Returns tibble, not vector</span> tbl<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token punctuation">]</span> <span class="token comment"># Always returns tibble</span> <span class="token comment"># Creating from data.frame</span> df <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span>x <span class="token operator">=</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">,</span> y <span class="token operator">=</span> letters<span class="token punctuation">[</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">)</span> tbl2 <span class="token operator"><-</span> as_tibble<span class="token punctuation">(</span>df<span class="token punctuation">)</span> |
Part 9: Choosing the Right Data Structure
Decision Guide
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
<span class="token comment"># Use a VECTOR when:</span> <span class="token comment"># - You have a single sequence of values (all same type)</span> <span class="token comment"># - You need simple 1D storage</span> scores <span class="token operator"><-</span> c<span class="token punctuation">(</span><span class="token number">85</span><span class="token punctuation">,</span> <span class="token number">92</span><span class="token punctuation">,</span> <span class="token number">78</span><span class="token punctuation">,</span> <span class="token number">95</span><span class="token punctuation">,</span> <span class="token number">88</span><span class="token punctuation">)</span> <span class="token comment"># Use a MATRIX when:</span> <span class="token comment"># - You have 2D data all of the same type</span> <span class="token comment"># - You need to do matrix algebra</span> correlation_matrix <span class="token operator"><-</span> cor<span class="token punctuation">(</span>matrix<span class="token punctuation">(</span>rnorm<span class="token punctuation">(</span><span class="token number">100</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Use an ARRAY when:</span> <span class="token comment"># - You have 3D or higher dimensional data</span> <span class="token comment"># - All data is the same type</span> <span class="token comment"># - You need to store something like images or time-series across multiple dimensions</span> image_data <span class="token operator"><-</span> array<span class="token punctuation">(</span>runif<span class="token punctuation">(</span><span class="token number">100</span> <span class="token operator">*</span> <span class="token number">100</span> <span class="token operator">*</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> dim <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">100</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># RGB image</span> <span class="token comment"># Use a LIST when:</span> <span class="token comment"># - You need to store different types of data together</span> <span class="token comment"># - Each element can have different lengths</span> <span class="token comment"># - You're building complex nested structures</span> patient_record <span class="token operator"><-</span> list<span class="token punctuation">(</span> id <span class="token operator">=</span> <span class="token string">"P12345"</span><span class="token punctuation">,</span> name <span class="token operator">=</span> <span class="token string">"John Doe"</span><span class="token punctuation">,</span> visits <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"2023-01-15"</span><span class="token punctuation">,</span> <span class="token string">"2023-03-20"</span><span class="token punctuation">,</span> <span class="token string">"2023-06-10"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> vitals <span class="token operator">=</span> data.frame<span class="token punctuation">(</span> date <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"2023-01-15"</span><span class="token punctuation">,</span> <span class="token string">"2023-03-20"</span><span class="token punctuation">,</span> <span class="token string">"2023-06-10"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> bp_systolic <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">120</span><span class="token punctuation">,</span> <span class="token number">118</span><span class="token punctuation">,</span> <span class="token number">122</span><span class="token punctuation">)</span><span class="token punctuation">,</span> bp_diastolic <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">80</span><span class="token punctuation">,</span> <span class="token number">79</span><span class="token punctuation">,</span> <span class="token number">81</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">,</span> insurance <span class="token operator">=</span> list<span class="token punctuation">(</span> provider <span class="token operator">=</span> <span class="token string">"HealthPlus"</span><span class="token punctuation">,</span> policy <span class="token operator">=</span> <span class="token string">"HP98765"</span><span class="token punctuation">,</span> valid_until <span class="token operator">=</span> <span class="token string">"2024-12-31"</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token comment"># Use a DATA FRAME when:</span> <span class="token comment"># - You have tabular data with mixed types</span> <span class="token comment"># - Each column represents a variable</span> <span class="token comment"># - Each row represents an observation</span> <span class="token comment"># - You're doing data analysis (this is the most common!)</span> survey_data <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> respondent_id <span class="token operator">=</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">100</span><span class="token punctuation">,</span> age <span class="token operator">=</span> sample<span class="token punctuation">(</span><span class="token number">18</span><span class="token operator">:</span><span class="token number">80</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span><span class="token punctuation">,</span> gender <span class="token operator">=</span> sample<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"F"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span><span class="token punctuation">,</span> satisfaction <span class="token operator">=</span> sample<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span><span class="token punctuation">,</span> comments <span class="token operator">=</span> sample<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"Good"</span><span class="token punctuation">,</span> <span class="token string">"OK"</span><span class="token punctuation">,</span> <span class="token string">"Bad"</span><span class="token punctuation">,</span> <span class="token keyword">NA</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token comment"># Use a FACTOR when:</span> <span class="token comment"># - You have categorical data with known levels</span> <span class="token comment"># - You need to control the order of categories</span> <span class="token comment"># - You're doing statistical modeling (factors are important!)</span> education <span class="token operator"><-</span> factor<span class="token punctuation">(</span> c<span class="token punctuation">(</span><span class="token string">"High School"</span><span class="token punctuation">,</span> <span class="token string">"College"</span><span class="token punctuation">,</span> <span class="token string">"Graduate"</span><span class="token punctuation">,</span> <span class="token string">"College"</span><span class="token punctuation">,</span> <span class="token string">"High School"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> levels <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"High School"</span><span class="token punctuation">,</span> <span class="token string">"College"</span><span class="token punctuation">,</span> <span class="token string">"Graduate"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ordered <span class="token operator">=</span> <span class="token boolean">TRUE</span> <span class="token punctuation">)</span> <span class="token comment"># Use a TIBBLE when:</span> <span class="token comment"># - You want modern data frame features</span> <span class="token comment"># - You're working in the tidyverse ecosystem</span> <span class="token comment"># - You want nicer printing by default</span> <span class="token comment"># - You need list columns</span> library<span class="token punctuation">(</span>tibble<span class="token punctuation">)</span> modern_data <span class="token operator"><-</span> tibble<span class="token punctuation">(</span> id <span class="token operator">=</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">,</span> name <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"A"</span><span class="token punctuation">,</span> <span class="token string">"B"</span><span class="token punctuation">,</span> <span class="token string">"C"</span><span class="token punctuation">,</span> <span class="token string">"D"</span><span class="token punctuation">,</span> <span class="token string">"E"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> nested <span class="token operator">=</span> list<span class="token punctuation">(</span> c<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">,</span> matrix<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">4</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">,</span> data.frame<span class="token punctuation">(</span>x<span class="token operator">=</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">,</span> list<span class="token punctuation">(</span>a<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">,</span>b<span class="token operator">=</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> |
Part 10: Converting Between Structures
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
<span class="token comment"># Vector to matrix</span> v <span class="token operator"><-</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">12</span> m <span class="token operator"><-</span> matrix<span class="token punctuation">(</span>v<span class="token punctuation">,</span> nrow <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">)</span> print<span class="token punctuation">(</span>m<span class="token punctuation">)</span> <span class="token comment"># Matrix to data frame</span> df_from_m <span class="token operator"><-</span> as.data.frame<span class="token punctuation">(</span>m<span class="token punctuation">)</span> print<span class="token punctuation">(</span>df_from_m<span class="token punctuation">)</span> <span class="token comment"># Data frame to matrix (all numeric)</span> df_num <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span>x <span class="token operator">=</span> <span class="token number">1</span><span class="token operator">:</span><span class="token number">3</span><span class="token punctuation">,</span> y <span class="token operator">=</span> <span class="token number">4</span><span class="token operator">:</span><span class="token number">6</span><span class="token punctuation">,</span> z <span class="token operator">=</span> <span class="token number">7</span><span class="token operator">:</span><span class="token number">9</span><span class="token punctuation">)</span> m_from_df <span class="token operator"><-</span> as.matrix<span class="token punctuation">(</span>df_num<span class="token punctuation">)</span> print<span class="token punctuation">(</span>m_from_df<span class="token punctuation">)</span> <span class="token comment"># List to vector (if possible)</span> l <span class="token operator"><-</span> list<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">4</span><span class="token punctuation">)</span> v_from_l <span class="token operator"><-</span> unlist<span class="token punctuation">(</span>l<span class="token punctuation">)</span> print<span class="token punctuation">(</span>v_from_l<span class="token punctuation">)</span> <span class="token comment"># Data frame to list</span> l_from_df <span class="token operator"><-</span> as.list<span class="token punctuation">(</span>df<span class="token punctuation">)</span> print<span class="token punctuation">(</span>l_from_df<span class="token punctuation">)</span> <span class="token comment"># Matrix to array</span> arr_from_m <span class="token operator"><-</span> array<span class="token punctuation">(</span>m<span class="token punctuation">,</span> dim <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Reshape</span> print<span class="token punctuation">(</span>arr_from_m<span class="token punctuation">)</span> |
Part 11: Practical Example – Complete Data Analysis Pipeline
Let’s combine everything we’ve learned in a realistic example:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
<span class="token comment"># Step 1: Create raw data (using vectors)</span> set.seed<span class="token punctuation">(</span><span class="token number">123</span><span class="token punctuation">)</span> n <span class="token operator"><-</span> <span class="token number">100</span> customer_ids <span class="token operator"><-</span> paste0<span class="token punctuation">(</span><span class="token string">"CUST"</span><span class="token punctuation">,</span> sprintf<span class="token punctuation">(</span><span class="token string">"%04d"</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token operator">:</span>n<span class="token punctuation">)</span><span class="token punctuation">)</span> ages <span class="token operator"><-</span> sample<span class="token punctuation">(</span><span class="token number">18</span><span class="token operator">:</span><span class="token number">70</span><span class="token punctuation">,</span> n<span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> genders <span class="token operator"><-</span> sample<span class="token punctuation">(</span>c<span class="token punctuation">(</span><span class="token string">"M"</span><span class="token punctuation">,</span> <span class="token string">"F"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> n<span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">)</span> purchases <span class="token operator"><-</span> rpois<span class="token punctuation">(</span>n<span class="token punctuation">,</span> lambda <span class="token operator">=</span> <span class="token number">5</span><span class="token punctuation">)</span> <span class="token comment"># Number of purchases</span> amounts <span class="token operator"><-</span> round<span class="token punctuation">(</span>runif<span class="token punctuation">(</span>n<span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">500</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token comment"># Purchase amounts</span> satisfaction <span class="token operator"><-</span> sample<span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">:</span><span class="token number">5</span><span class="token punctuation">,</span> n<span class="token punctuation">,</span> replace <span class="token operator">=</span> <span class="token boolean">TRUE</span><span class="token punctuation">,</span> prob <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">0.05</span><span class="token punctuation">,</span> <span class="token number">0.1</span><span class="token punctuation">,</span> <span class="token number">0.3</span><span class="token punctuation">,</span> <span class="token number">0.35</span><span class="token punctuation">,</span> <span class="token number">0.2</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Step 2: Create data frame (main structure)</span> customers <span class="token operator"><-</span> data.frame<span class="token punctuation">(</span> id <span class="token operator">=</span> customer_ids<span class="token punctuation">,</span> age <span class="token operator">=</span> ages<span class="token punctuation">,</span> gender <span class="token operator">=</span> as.factor<span class="token punctuation">(</span>genders<span class="token punctuation">)</span><span class="token punctuation">,</span> purchases <span class="token operator">=</span> purchases<span class="token punctuation">,</span> total_spent <span class="token operator">=</span> amounts<span class="token punctuation">,</span> satisfaction <span class="token operator">=</span> as.ordered<span class="token punctuation">(</span>satisfaction<span class="token punctuation">)</span><span class="token punctuation">,</span> stringsAsFactors <span class="token operator">=</span> <span class="token boolean">FALSE</span> <span class="token punctuation">)</span> <span class="token comment"># Step 3: Add calculated columns</span> customers<span class="token operator">$</span>avg_purchase <span class="token operator"><-</span> customers<span class="token operator">$</span>total_spent <span class="token operator">/</span> customers<span class="token operator">$</span>purchases customers<span class="token operator">$</span>avg_purchase<span class="token punctuation">[</span>is.nan<span class="token punctuation">(</span>customers<span class="token operator">$</span>avg_purchase<span class="token punctuation">)</span><span class="token punctuation">]</span> <span class="token operator"><-</span> <span class="token number">0</span> <span class="token comment"># Step 4: Create segments using factors</span> customers<span class="token operator">$</span>age_group <span class="token operator"><-</span> cut<span class="token punctuation">(</span>customers<span class="token operator">$</span>age<span class="token punctuation">,</span> breaks <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">25</span><span class="token punctuation">,</span> <span class="token number">40</span><span class="token punctuation">,</span> <span class="token number">60</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">)</span><span class="token punctuation">,</span> labels <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Young"</span><span class="token punctuation">,</span> <span class="token string">"Adult"</span><span class="token punctuation">,</span> <span class="token string">"Middle"</span><span class="token punctuation">,</span> <span class="token string">"Senior"</span><span class="token punctuation">)</span><span class="token punctuation">)</span> customers<span class="token operator">$</span>value_segment <span class="token operator"><-</span> factor<span class="token punctuation">(</span> ifelse<span class="token punctuation">(</span>customers<span class="token operator">$</span>total_spent <span class="token operator">></span> <span class="token number">400</span><span class="token punctuation">,</span> <span class="token string">"High"</span><span class="token punctuation">,</span> ifelse<span class="token punctuation">(</span>customers<span class="token operator">$</span>total_spent <span class="token operator">></span> <span class="token number">200</span><span class="token punctuation">,</span> <span class="token string">"Medium"</span><span class="token punctuation">,</span> <span class="token string">"Low"</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> levels <span class="token operator">=</span> c<span class="token punctuation">(</span><span class="token string">"Low"</span><span class="token punctuation">,</span> <span class="token string">"Medium"</span><span class="token punctuation">,</span> <span class="token string">"High"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ordered <span class="token operator">=</span> <span class="token boolean">TRUE</span> <span class="token punctuation">)</span> <span class="token comment"># Step 5: Create summary matrices</span> segment_summary <span class="token operator"><-</span> matrix<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> nrow <span class="token operator">=</span> length<span class="token punctuation">(</span>levels<span class="token punctuation">(</span>customers<span class="token operator">$</span>age_group<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ncol <span class="token operator">=</span> length<span class="token punctuation">(</span>levels<span class="token punctuation">(</span>customers<span class="token operator">$</span>value_segment<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> rownames<span class="token punctuation">(</span>segment_summary<span class="token punctuation">)</span> <span class="token operator"><-</span> levels<span class="token punctuation">(</span>customers<span class="token operator">$</span>age_group<span class="token punctuation">)</span> colnames<span class="token punctuation">(</span>segment_summary<span class="token punctuation">)</span> <span class="token operator"><-</span> levels<span class="token punctuation">(</span>customers<span class="token operator">$</span>value_segment<span class="token punctuation">)</span> <span class="token keyword">for</span> <span class="token punctuation">(</span>i <span class="token keyword">in</span> <span class="token number">1</span><span class="token operator">:</span>nrow<span class="token punctuation">(</span>customers<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> age_idx <span class="token operator"><-</span> which<span class="token punctuation">(</span>rownames<span class="token punctuation">(</span>segment_summary<span class="token punctuation">)</span> <span class="token operator">==</span> customers<span class="token operator">$</span>age_group<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span> val_idx <span class="token operator"><-</span> which<span class="token punctuation">(</span>colnames<span class="token punctuation">(</span>segment_summary<span class="token punctuation">)</span> <span class="token operator">==</span> customers<span class="token operator">$</span>value_segment<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span> segment_summary<span class="token punctuation">[</span>age_idx<span class="token punctuation">,</span> val_idx<span class="token punctuation">]</span> <span class="token operator"><-</span> segment_summary<span class="token punctuation">[</span>age_idx<span class="token punctuation">,</span> val_idx<span class="token punctuation">]</span> <span class="token operator">+</span> <span class="token number">1</span> <span class="token punctuation">}</span> <span class="token comment"># Step 6: Create list of results</span> analysis_results <span class="token operator"><-</span> list<span class="token punctuation">(</span> raw_data <span class="token operator">=</span> customers<span class="token punctuation">,</span> summary_stats <span class="token operator">=</span> list<span class="token punctuation">(</span> by_gender <span class="token operator">=</span> aggregate<span class="token punctuation">(</span>cbind<span class="token punctuation">(</span>purchases<span class="token punctuation">,</span> total_spent<span class="token punctuation">)</span> <span class="token operator">~</span> gender<span class="token punctuation">,</span> data <span class="token operator">=</span> customers<span class="token punctuation">,</span> FUN <span class="token operator">=</span> mean<span class="token punctuation">)</span><span class="token punctuation">,</span> by_age_group <span class="token operator">=</span> aggregate<span class="token punctuation">(</span>cbind<span class="token punctuation">(</span>purchases<span class="token punctuation">,</span> total_spent<span class="token punctuation">)</span> <span class="token operator">~</span> age_group<span class="token punctuation">,</span> data <span class="token operator">=</span> customers<span class="token punctuation">,</span> FUN <span class="token operator">=</span> mean<span class="token punctuation">)</span><span class="token punctuation">,</span> satisfaction_dist <span class="token operator">=</span> table<span class="token punctuation">(</span>customers<span class="token operator">$</span>satisfaction<span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">,</span> matrices <span class="token operator">=</span> list<span class="token punctuation">(</span> segment_counts <span class="token operator">=</span> segment_summary<span class="token punctuation">,</span> correlation <span class="token operator">=</span> cor<span class="token punctuation">(</span>customers<span class="token punctuation">[</span><span class="token punctuation">,</span> c<span class="token punctuation">(</span><span class="token string">"age"</span><span class="token punctuation">,</span> <span class="token string">"purchases"</span><span class="token punctuation">,</span> <span class="token string">"total_spent"</span><span class="token punctuation">,</span> <span class="token string">"avg_purchase"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token punctuation">)</span><span class="token punctuation">,</span> metadata <span class="token operator">=</span> list<span class="token punctuation">(</span> created <span class="token operator">=</span> Sys.time<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> n_customers <span class="token operator">=</span> n<span class="token punctuation">,</span> version <span class="token operator">=</span> <span class="token string">"1.0"</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token comment"># Step 7: Explore the results</span> str<span class="token punctuation">(</span>analysis_results<span class="token punctuation">,</span> max.level <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token comment"># View summary statistics</span> print<span class="token punctuation">(</span>analysis_results<span class="token operator">$</span>summary_stats<span class="token operator">$</span>by_gender<span class="token punctuation">)</span> print<span class="token punctuation">(</span>analysis_results<span class="token operator">$</span>summary_stats<span class="token operator">$</span>by_age_group<span class="token punctuation">)</span> <span class="token comment"># View segment counts</span> print<span class="token punctuation">(</span>analysis_results<span class="token operator">$</span>matrices<span class="token operator">$</span>segment_counts<span class="token punctuation">)</span> <span class="token comment"># Correlation matrix as an array</span> print<span class="token punctuation">(</span>round<span class="token punctuation">(</span>analysis_results<span class="token operator">$</span>matrices<span class="token operator">$</span>correlation<span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">)</span> |
Summary: The Data Structure Philosophy
Choosing the right data structure is like choosing the right tool for a job:
-
Vectors: Simple lists of the same thing (like a shopping list)
-
Matrices: Tables where everything is the same type (like a spreadsheet of numbers)
-
Arrays: Multi-dimensional data (like a stack of matrices)
-
Lists: Collections of different things (like a toolbox)
-
Data Frames: Mixed data in table form (like a database table)
-
Factors: Categories with levels (like survey responses)
-
Tibbles: Modern, enhanced data frames
Key principles:
-
Homogeneous vs Heterogeneous: Same type or different types?
-
Dimensions: 1D, 2D, or nD?
-
Structure: Do you need row/column orientation?
-
Operations: What will you do with the data?
-
Memory efficiency: Factors for categories, matrices for numbers
Data structure selection guide:
-
Simple list of numbers → Vector
-
2D table of numbers → Matrix
-
2D table with mixed types → Data Frame
-
Need tidyverse features → Tibble
-
Complex, nested data → List
-
Multi-dimensional data → Array
-
Categorical data → Factor
Mastering R’s data structures is like learning the grammar of a language – once you know them, you can express any data analysis task clearly and efficiently. Each structure has its strengths, and knowing when to use which is the key to writing elegant, efficient R code.
Would you like me to elaborate on any specific data structure or explore more advanced operations with them?
