====== C - Basic Arrays ====== **NOTE**: This tutorial assumes you are comfortable with [[c:pointers|pointers]]. ---- ===== A single variable is used to store the age of a person ===== ==== C Code Listing 1 ==== 1: #include 2: 3: int main() 4: { 5: short age; 6: age=23; 7: printf("%d\n", age); 8: return 0; 9: } ---- ==== C++ Code Listing 1 ==== 1: #include 2: 3: int main() 4: { 5: short age; 6: age=23; 7: std::cout << age << std::endl; 8: return 0; 9: } **NOTE:** The variable **age** is created at line (5) as a short. * A value is assigned to age. * Finally, age is printed to the screen. Image: A simple variable in memory {{:c:c_address_age_as_var.jpg?400|}} ---- ===== Keep track of 4 ages instead of just one ===== 4 separate variables could be used, but this does not scale easily if the number to track moves into the thousands. Instead of using separate variables to store each age, it is better to use an array. ==== C Code Listing 2 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: return 0; 11: } ---- ==== C++ Code Listing 2 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: return 0; 11: } **NOTE:** This creates and initializes an array. * On line (5), an array of 4 shorts is created. * Values are assigned to each variable in the array on line (6) through line (9). Image: Array with 4 elements {{:c:c_address_age_as_arr.jpg?400|}} ---- **NOTE:** Accessing any single short variable, or element, in the array is straightforward. Simply provide a number in square braces next to the name of the array. The number identifies which of the 4 elements in the array you want to access. The program above shows that the first element of an array is accessed with the number 0 rather than 1. ---- ===== Printing arrays ===== Print the array to the screen: ==== C Code Listing 3 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: printf("%x\n", age); 12: return 0; 13: } ---- ==== C++ Code Listing 3 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: std::cout << age << std::endl; 12: return 0; 13: } **NOTE:** Line (11) is meant to print the 4 ages to the screen. But instead of printing out the four short variables, what appears to be nonsense prints out instead. What the "nonsense" output actually is and why the 4 array values were not printed will be addressed later. For now, the important point to come away with is that simply providing the name of the array in an output statement will not print out the elements of the array. ---- How about printing out each of the values separately? Try this: ==== C Code Listing 4 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: printf("%d\n", age[0]); 11: printf("%d\n", age[1]); 12: printf("%d\n", age[2]); 13: printf("%d\n", age[3]); 14: return 0; 15: } ---- ==== C++ Code Listing 4 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: std::cout << age[0] << std::endl; 11: std::cout << age[1] << std::endl; 12: std::cout << age[2] << std::endl; 13: std::cout << age[3] << std::endl; 14: return 0; 15: } **NOTE:** Lines (10) through line (13) produce the output we are expecting. There is no single statement in the language that says "print an entire array to the screen". Each element in the array must be printed to the screen individually. ---- ===== Copying arrays ===== Suppose that after filling our 4 element array with values, we need to copy that array to another array of 4 short's? Try this: ==== C Code Listing 5 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: short same_age[4]; 7: age[0]=23; 8: age[1]=34; 9: age[2]=65; 10: age[3]=74; 11: 12: same_age=age; 13: 14: printf("%d\n", same_age[0]); 15: printf("%d\n", same_age[1]); 16: printf("%d\n", same_age[2]); 17: printf("%d\n", same_age[3]); 18: return 0; 19: } ---- ==== C++ Code Listing 5 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: short same_age[4]; 7: age[0]=23; 8: age[1]=34; 9: age[2]=65; 10: age[3]=74; 11: 12: same_age=age; 13: 14: std::cout << same_age[0] << std::endl; 15: std::cout << same_age[1] << std::endl; 16: std::cout << same_age[2] << std::endl; 17: std::cout << same_age[3] << std::endl; 18: return 0; 19: } **NOTE:** Line (12) tries to copy the age array into the same_age array. What happened when you tried to compile the program above? The point here is that simply assigning one array to another will not copy the elements of the array. The hard question to answer is why the code doesn't compile. Later in the tutorial this example will be re-examined to explain why line (12) doesn't work. This code should not compile on either C or C++ compilers. However, some older C++ compilers may ignore the ISO C++ standard and allow line 12 to compile. If it does compile on your C++ compiler, make a mental note that it is incorrect behavior. ---- Let's try copying arrays using a technique similar to the technique used to print arrays (that is, one element at a time): ==== C Code Listing 6 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: short same_age[4]; 7: 8: age[0]=23; 9: age[1]=34; 10: age[2]=65; 11: age[3]=74; 12: 13: same_age[0]=age[0]; 14: same_age[1]=age[1]; 15: same_age[2]=age[2]; 16: same_age[3]=age[3]; 17: 18: printf("%d\n", same_age[0]); 19: printf("%d\n", same_age[1]); 20: printf("%d\n", same_age[2]); 21: printf("%d\n", same_age[3]); 22: return 0; 23: } ---- ==== C++ Code Listing 6 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: short same_age[4]; 7: 8: age[0]=23; 9: age[1]=34; 10: age[2]=65; 11: age[3]=74; 12: 13: same_age[0]=age[0]; 14: same_age[1]=age[1]; 15: same_age[2]=age[2]; 16: same_age[3]=age[3]; 17: 18: std::cout << same_age[0] << std::endl; 19: std::cout << same_age[1] << std::endl; 20: std::cout << same_age[2] << std::endl; 21: std::cout << same_age[3] << std::endl; 23: return 0; 23: } **NOTE** This technique for copying arrays works fine. * Two arrays are created: age and same_age. * Each element of the age array is assigned a value. * Then, in order to copy of the four elements in age into the same_age array, we must do it element by element. Image: Copy element 1 Copy first element {{:c:c_address_arr_copy_first.jpg?400|}} Image: Copy element 2 Copy second element {{:c:c_address_arr_copy_second.jpg?500|}} Like printing arrays, there is no single statement in the language that says "copy an entire array into another array". The array elements must be copied individually. The technique used to copy one array into another is exactly the same as the technique used to copy 4 separate variables into 4 other variables. So what is the advantage to using arrays over separate variables? One significant advantage of an array over separate variables is the name. In our examples, using four separate variables requires 4 unique names. The 4 short variables in our array have the same name, age. The 4 short's in the array are identical except for an index number used to access them. This distinction allows us to shorten our code in a way that would be impossible with 4 variables, each with unique names: ==== C Code Listing 7 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: short same_age[4]; 7: int i, j; 8: age[0]=23; 9: age[1]=34; 10: age[2]=65; 11: age[3]=74; 12: 13: for(i=0; i<4; i++) 14: same_age[i]=age[i]; 15: 16: for(j=0; j<4; j++) 17: printf("%d\n", same_age[j]); 18: return 0; 19: } ---- ==== C++ Code Listing 7 ==== 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: short same_age[4]; 7: int i, j; 8: age[0]=23; 9: age[1]=34; 10: age[2]=65; 11: age[3]=74; 12: 13: for(i=0; i<4; i++) 14: same_age[i]=age[i]; 15: 16: for(j=0; j<4; j++) 17: std::cout << same_age[j] << std::endl; 18: return 0; 19: } Since the only difference between each of the short's in the arrays is their index, a loop and a counter can be used to more easily copy all of the elements. The same technique is used to shorten the code that prints the array to the screen. Even though arrays give us some convenience when managing many variables of the same type, there is little difference between an array and variables declared individually. There is no single statement to copy an array, there is no single statement to print an array. If we want to perform any action on an array, we must repeatedly perform that action on each element in the array. ---- ===== Array Internals ===== To understand what exactly is happening inside of arrays, first recall the behavior of variables from the pointer tutorial: Try: ^C++^C^ | 1: #include 2: int main() 3: { 4: float fl=3.14; 5: std::cout << fl << std::endl; 6: return 0; 7: } | 1:#include 2:int main() 3: { 4: float fl=3.14; 5: printf("%.2f\n", fl); 6: return 0; 7: } | At line (4) in the program above, the computer reserves memory for fl. In our examples, we'll assume that a float requires 4 bytes. Depending on the computer's architecture, a float may require 2, 4, 8 or some other number of bytes. Image: Variable fl allocated {{:c:c_address_float.jpg?400|}} When fl is used in line (5), two distinct steps occur: * The program finds and grabs the address reserved for fl--in this example 924. * The contents stored at that address are retrieved. To generalize, whenever any variable is accessed, the above two distinct steps occur to retrieve the contents of the variable. So, how is this related to referring to arrays without providing an index number (e.g. age, not age[2])? Here's the program from earlier in the tutorial that was our first crude attempt at printing out the contents of an array: C Code Listing 8 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: printf("%x\n", age); 12: return 0; 13: } C++ Code Listing 8 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: std::cout << age << std::endl; 12: return 0; 13: } Line (11) prints out nonsense, not the contents of our array. Let's apply the 2 steps from the pointers tutorial to our array on line (11): Image: Array with 4 elements {{:c:c_address_age_as_arr.jpg?400|}} * On line (11), The program finds and grabs the address reserved for age. Simple enough. According to the image, age starts at address 924 so it is address 924 that is grabbed. * The contents stored at that address are retrieved (Oops!). Step 2 is the problem. Our array has four values, not just one. To retrieve the contents at 924 would be inappropriate because 924 is the address of the first value in the array, not all values in the array. As a result, step 2 isn't performed on arrays, just step 1. The output from line (11) is the address of the array: step 2 has not occurred. In order to get a value from an array, the programmer must "manually" do step 2 to retrieve the contents. ===== Accessing elements in an array ===== From the [[c:pointer|pointer]] notes, we know that in order to get the contents at an address we use the * operator. Let's try it on our printing example: C Code Listing 9 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: printf("%d\n", *age); 12: return 0; 13: } C++ Code Listing 9 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: std::cout << *age << std::endl; 12: return 0; 13: } On line (11) the * operator has been added. When this code is compiled and executed, 23 prints to the screen. When just the array's name is provided, only step 1 occurs with an array. We then manually do step 2 to get the contents at that address--namely the value of the first element of our array. That's all fine and good, but what about the rest of the array? What if we want to access the second, third, or fourth element of our array? We know that when we use the name of an array, step 1 occurs and we get the address of the first element of that array. We also know (from the pointers tutorial) that an address is nothing more than an integer. Given that the address is nothing but an integer, we can use addition to access the other array elements! The big question is: How much do we add to age in line (11) above to get the address of the second element in the age array? Image: Array with 4 elements {{:c:c_address_age_as_arr.jpg?400|}} Did you answer "2"? Two is correct for anyone who's familiar with basic rules of addition--and 2 makes sense if you look at the image. The image shows the first element of age beginning at 924, the second at 926, the third at 928, and so on. However, in this case C and C++ don't follow the rules of arithmetic. C and C++ follow the rules of pointer arithmetic. I'm sorry. Don't blame me. I'm just the storyteller, I didn't invent the darned language. In C and C++ you add "1" to age (that is, the address of age) to access the second element, add "2" to age to access the third element, etc. If you hadn't seen the fancy image showing that each element of age (each short variable) had 2 bytes, then you might have said "1" and gotten the correct answer. We'll explain all of this in just a bit. But first let's see what that looks like in code. Here is how we can print the value of the second element in the array: C Code Listing 10 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: printf("%d\n", *(age+1)); 12: return 0; 13: } C++ Code Listing 10 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: std::cout << *(age+1) << std::endl; 12: return 0; 13: } Line (11) now shows pointer arithmetic in action: * Step 1 is done on age--the address 924 is returned. * Using pointer arithmetic, 1 is added to 924 which produces the address of the next element in the array. Looking at the image of our example, the result is 926. * Using the * operator ("do step 2 on a number"), the contents at that address are retrieved. The contents at address 926 is the number 34, which is what prints to the screen. The parentheses are required to ensure that the * operator and the + operator occur in the correct order. This is due to operator precedence rules. ===== Pointer Arithmetic ===== "This new learning amazes me, Sir Bedevere! Explain again, how sheep's bladders can be employed to prevent earthquakes." -King Arthur, Monty Python and the Holy Grail Why is pointer arithmetic different from normal arithmetic? If you recall from the pointers tutorial, pointers in C and C++ are typed. That is, in C and C++ you can have float pointers, int pointers, char pointers, etc. Here's an excerpt from a side note in the pointers tutorial: ...A primary motivation for having a pointer type for each variable type is to help the compiler. Referring back to an earlier example, when we get the contents at an address ("do step 2 on a number"), the compiler must know how to get the contents at the address. The compiler needs to know two things when getting the contents at an address: * How many bytes starting at the given address make up the value. For example, does the address refer to a 2 byte short int or a 4 byte float or an 8 byte double? * How to convert those bytes into the value. That is, the process for converting 4 bytes into a float is different than the process for converting 2 bytes into a short int. We've seen that when the name of an array is specified all by itself, only step 1 is done, not step 2. After step 1, we have the address to the first element in the array. In other words, it is a pointer! What kind of pointer? The type of the pointer is the same type as the array. In our sample program we created an array of short integers. So, the address is a short pointer (that is, it is a: short*). The type of the pointer not only allows the compiler to correctly get the contents at an address, but it also allows the compiler to correctly calculate the number of bytes to add when doing pointer arithmetic: 1: int miles[4]; 2: miles[0]=342; 3: miles[1]=86; 4: miles[2]=911; 5: miles[3]=7; 6: ... *(miles+2) ... Image: int array {{:c:c_address_arr_int.jpg?400|}} In line (4), miles is the address of an integer, or an int*. By the rules of pointer arithmetic, line (4) will get the contents at address 928. 1: float gpa[4]; 2: gpa[0]=2.3; 3: gpa[1]=4.0; 4: gpa[2]=3.27; 5: gpa[3]=0.0; 6: ... *(gpa+2) ... Image: float array {{:c:c_address_arr_float.jpg?400|}} In line (4), gpa is the address of a floating point number, or a float*. By the rules of pointer arithmetic, line (4) will get the contents at address 932. 1: char letter[4]; 2: letter[0]='M'; 3: letter[1]='a'; 4: letter[2]='r'; 5: letter[3]='k'; 6: ... *(letter+2) ... Image: char array {{:c:c_address_arr_char.jpg?400|}} In line (4), letter is the address of an character, or a char*. By the rules of pointer arithmetic, line (4) will get the contents at address 926. ===== Array Basics Revisited ===== Now that we have some insight about C and C++ array quirks, let's take another look at how we started accessing array elements. Prior to diving into this whole step 1 step 2 business with arrays we were initializing, printing, copying, and in general accessing array elements using square braces: C Code Listing 11 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: printf("%d\n", age[2]); 12: return 0; 13: } C++ Code Listing 11 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: age[0]=23; 7: age[1]=34; 8: age[2]=65; 9: age[3]=74; 10: 11: std::cout << age[2] << std::endl; 12: return 0; 13: } On lines (6-9) we are using square braces to access elements for initialization. On line (11) we are using square braces to print out the third element of the array. From the previous section you might have been led to believe that you would print out the third element by putting something like *(age+2). As it turns out, the *(age+2) syntax is the "correct" syntax, in terms of how the language handles arrays (and pointers). The square braces syntax is nothing more than a shorthand for the pointer syntax. When you type age[2], the compiler internally converts it to: *(age+2). In fact, whenever you use the square braces, the compiler just blindly substitutes the first term into the left-hand side of the addition and the second term into the right-hand side of the addition: Image: blind substitution from array syntax to pointer arithmetic syntax {{:c:c_address_arr_template.jpg?300|}} This is actually what happens. If you type monkey[ crackers ] , the compiler tries to compile *(monkey + crackers) I'm not kidding! Let's prove it. When as a youngster you were asked to add 5+3, no problem. But what about when the numbers were switched and you were asked to add 3+5? Still no problem because of the commutative property of addition. The commutative property of addition holds in C and C++ just like it did when you were a youngster. So, *(age+2) produces exactly the same result as *(2+age). That is, the number of bytes equivalent to two short integers is added to age and then the contents are retrieved at that address. OK, so what? In the previous paragraph I said that when you write something like age[2] the compiler blindly substitutes it into its pointer syntax and compiles *(age+2). So, let's turn it around and prove that all of this array stuff is nothing but simple pointer access with a little addition: C Code Listing 12 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: 0[age]=23; 7: 1[age]=34; 8: 2[age]=65; 9: 3[age]=74; 10: 11: printf("%d\n", 2[age]); 12: return 0; 13: } C++ Code Listing 12 1: #include 2: 3: int main() 4: { 5: short age[4]; 6: 0[age]=23; 7: 1[age]=34; 8: 2[age]=65; 9: 3[age]=74; 10: 11: std::cout << 2[age] << std::endl; 12: return 0; 13: } Image: blind substitution from array syntax to pointer arithmetic syntax is equivalent to Image: reversed--blind substitution from array syntax to pointer arithmetic syntax |{{:c:c_address_arr_template1.jpg?%300|}}|is equivalent to|{{:c:c_address_arr_template2.jpg?300|}}| The commutative property of addition is being taken advantage of in lines (6-9) and line (11). Don't just stare at the code above, type it in, compile it, and run it! Again, this is nothing but simple pointer arithmetic. For example, on line (11), the third element of the age array is accessed. Before compiling 2[age], the compiler first blindly converts it into *(2+age). Pretty cool, eh? We now have more than enough information to answer the question: "Why do arrays in C and C++ start at zero instead of one?". We know that the name of the array produces the address of the beginning of the array; and we know that the beginning of the array is also the address of the first element of the array. So, all of these are exactly the same: * *age * *(age+0) * age[0] The bottom line: Arrays are nothing more than a bunch of variables of the same type using the same name--that name is the address of the beginning of the bunch of variables. Re-summarizing, to access any element in an array provide: * The name of the array. This gets the address of the first element of the array. * A number. Pointer arithmetic is used to add this number to the address resulting in the address of one of the elements in the array. Of course if zero is added the address is unchanged (it remains the address of the first element). * Use the * operator to get the contents at that address. Alternately use the square bracket syntax and the compiler will do the addition and content retrieval for you.