Programming C# 12 - Chapter 5. Collections
Most programs need to deal with multiple pieces of data. Your code might have to iterate through some transactions to calculate the balance of an account, for example, or display recent messages in a social media web application, or update the positions of characters in a game. In most kinds of applications, the ability to work with collections of information is likely to be important.
C# offers a simple kind of collection called an array. The CLR’s type system supports arrays intrinsically, so they are efficient, but for some scenarios they can be too basic. The runtime libraries build on the fundamental services provided by arrays to provide more powerful and flexible collection types. I’ll start with arrays, because they are the foundation of most of the collection classes.
Arrays
An array is an object that contains multiple elements of a particular type. Each element is a storage location similar to a field, but whereas with fields we give each storage slot a name, array elements are simply numbered. The number of elements is fixed for the lifetime of the array, so you must specify the size when you create it. Example 5-1 shows the syntax for creating new arrays.
Example 5-1. Creating arrays
int[] numbers = new int[10];
string[] strings = new string[numbers.Length];
As with all objects, we construct an array with the new keyword followed by a type name, but instead of parentheses with constructor arguments, we put square brackets containing the array size. As the example shows, the expression defining the size can be a constant, but it doesn’t have to be—the second array’s size will be determined by evaluating numbers.Length at runtime. In this case, the second array will have 10 elements because we’re using the first array’s Length property. All arrays have this read-only property, and it returns the total number of elements in the array.
The Length property’s type is int, which means it can cope with arrays of up to about 2.1 billion elements. In a 32-bit process, the limiting factor on array size is likely to be available address space, but in 64-bit processes, larger arrays are possible, so there’s also a LongLength property of type long. However, you don’t see that used much, because the runtime does not currently support creation of arrays with more than 2,147,483,591 (0x7FFFFFC7) elements in any single dimension. So only rectangular multidimensional arrays (described later in this chapter) can contain more elements than Length can report. And even those have an upper limit of 4,294,967,295 (0xFFFFFFFF) elements on the current version of .NET.
In Example 5-1, I’ve broken my normal rule of avoiding redundant type names in variable declarations. The initializer expressions make it clear that the variables are arrays of int and string, respectively, so I’d normally use var for this sort of code, but I’ve made an exception here so that I can show how to write the name of an array type. Array types are distinct types in their own right, and if we want to refer to the type that is a single dimensional array of some particular element type, we put [] after the element type name.
All array types derive from a common base class called System.Array. This defines the Length and LongLength properties and various other members we’ll be looking at in due course. You can use array types in all the usual places you can use other types. So you could declare a field, or a method parameter of type string[]. You can also use an array type as a generic type argument. For example, IEnumerable<int[]> would be a sequence of arrays of integers (each of which could be a different size).
An array type is always a reference type, regardless of the element type. Nonetheless, the choice between reference type and value type elements makes a significant difference in an array’s behavior. As discussed in Chapter 3, when an object has a field with a value type, the value itself lives inside the memory allocated for the object. The same is true for arrays—when the elements are value types, the value lives in the array element itself, but with a reference type, elements contain only references. Each instance of a reference type has its own identity, and since multiple variables may all end up referring to that instance, the CLR needs to manage its lifetime independently of any other object, so it will end up with its own distinct block of memory. So while an array of 1,000 int values can all live in one contiguous memory block, with reference types, the array just contains the references, not the actual instances. An array of 1,000 different strings would need 1,001 heap blocks—one for the array and one for each string.
Note
When using reference type elements, you’re not obliged to make every element in an array of references refer to a distinct object. You can leave as many elements as you like set to null, and you’re also free to make multiple elements refer to the same object. This is just another variation on the theme that references in array elements work in much the same way as they do in local variables and fields.
To access an element in an array, we use square brackets containing the index of the element we’d like to use. The index is zero-based. Example 5-2 shows a few examples.
Example 5-2. Accessing array elements
// Continued from Example 5-1
numbers[0] = 42;
numbers[1] = numbers.Length;
numbers[2] = numbers[0] + numbers[1];
numbers[numbers.Length - 1] = 99;
As with the array’s size at construction, the array index can be a constant, but it can also be a more complex expression, calculated at runtime. In fact, that’s also true of the part that comes directly before the opening bracket. In Example 5-2, I’ve just used a variable name to refer to an array, but you can use brackets after any array-typed expression. Example 5-3 retrieves the first element of an array returned by a method call. (The details of the example aren’t strictly relevant, but in case you’re wondering, it finds the copyright message associated with the component that defines an object’s type. For example, if you pass a string to the method, it will return “© Microsoft Corporation. All rights reserved.” This uses the reflection API and custom attributes, the topics of Chapters 13 and 14.)
Example 5-3. Convoluted array access
public static string GetCopyrightForType(object o)
{
Assembly asm = o.GetType().Assembly;
var copyrightAttribute = (AssemblyCopyrightAttribute)
**asm.GetCustomAttributes(typeof(AssemblyCopyrightAttribute), true)[0];**
return copyrightAttribute.Copyright;
}
Expressions involving array element access are special, in that C# considers them to be a kind of variable. This means that as with local variables and fields, you can use them on the lefthand side of an assignment statement, whether they’re simple, like the expressions in Example 5-2, or more complex, like those in Example 5-3. You can also use them with the ref keyword (as described in Chapter 3) to pass a reference to a particular element to a method, to store it in a ref local variable, or to return it from a method with a ref return type.
The CLR always checks the index against the array size. If you try to use either a negative index or an index greater than or equal to the length of the array, the runtime will throw an IndexOutOfRangeException.
Although the size of an array is invariably fixed, its contents are always modifiable—there is no such thing as a read-only array. (As we’ll see in “ReadOnlyCollection
Example 5-4. How not to modify an array with immutable elements
var values = new Complex[10];
// These lines both cause compiler errors:
values[0].Real = 10;
values[0].Imaginary = 1;
The compiler complains because the Real and Imaginary properties are read-only; Complex does not provide any way to modify its values. Nevertheless, you can modify the array: even if you can’t modify an existing element in place, you can always overwrite it by supplying a different value, as Example 5-5 shows.
Example 5-5. Modifying an array with immutable elements
var values = new Complex[10];
values[0] = new Complex(10, 1);
Read-only arrays wouldn’t be much use in any case, because all arrays start out filled with a default value that you don’t get to specify. The CLR fills the memory for a new array with zeros, so you’ll see 0, null, or false, depending on the array’s element type.
Warning
If you write a zero-argument constructor for a struct, you might have expected array creation to invoke constructors of this kind automatically. It does not.
For some applications, all-zero (or equivalent) content might be a useful initial state for an array, but in some cases, you’ll want to set some other content before starting to work.
Array Initialization
The most straightforward way to initialize an array is to assign values into each element in turn. Example 5-6 creates a string array, and since string is a reference type, creating a five-element array doesn’t create five strings. Our array starts out with five nulls. (This is true even if you’ve enabled C#’s nullable references feature, as described in Chapter 3. Unfortunately, array initialization is one of the holes that make it impossible for that feature to offer absolute guarantees of non-nullness.) So the example goes on to populate each array element with a reference to a string.
Example 5-6. Laborious array initialization
var workingWeekDayNames = new string[5];
workingWeekDayNames[0] = "Monday";
workingWeekDayNames[1] = "Tuesday";
workingWeekDayNames[2] = "Wednesday";
workingWeekDayNames[3] = "Thursday";
workingWeekDayNames[4] = "Friday";
This works, but it is unnecessarily verbose. C# supports a shorter syntax that achieves the same thing, shown in Example 5-7. The compiler turns this into code that works like Example 5-6. With arrays of numbers (e.g., an int[]) the shorter syntax can generate more efficient code: instead of emitting a series of assignments, it can embed the array’s values as a chunk of raw data in the compiled output, enabling it to initialize the entire array from that data with a single call to a .NET runtime library helper function. (The embedded data is read-only, and will be copied into a new array each time the code runs, so the effect is equivalent to sequence of assignments. It’s just faster, and it results in more compact compiled code.)
Example 5-7. Initializing an array with a collection expression
string[] workingWeekDayNames =
["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"];
The syntax in Example 5-7 is called a collection expression, and it is new in C# 12.0. As you’ll see later, you can also use it with other collection types. It is now the preferred way to initialize sequence-like collections because it is more compact than the older approaches, it supports a wider range of collection types, and the compiler strives to generate the most efficient initialization code possible for whichever collection type you have chosen. (If you upgrade an older project to .NET 8.0, you will typically see several messages from the compiler suggesting that you move over to this new syntax.)
Collection expressions can copy data from other collections using the spread syntax illustrated in Example 5-8. This example creates a new array by copying the elements from the array in Example 5-7 into a new array with additional elements at the start and end. Spread elements are not required to be of precisely the same type—although this example spreads one string[] into a new string[], I could have incorporated any collection representing a sequence of string elements, such as the List
Example 5-8. Using a spread element to copy an array as part of a collection expression
string[] weekDayNames = ["Sunday", .. workingWeekDayNames, "Saturday"];
Collection expressions are designed to resemble the list patterns you saw in Chapter 2, which is why spreads use the same .. syntax as the slices that can appear in list patterns. However, while a list pattern can contain at most one slice, we’re not limited to a single spread: a collection expression may contain any number of spread expressions.
Since collection expressions are new in C# 12.0, you are likely to come across existing code that uses the older array initializer syntax shown in Example 5-9. That particular example is more verbose, because it explicitly states that we want a new array of type string[]. This means that there are still uses for this older syntax: if the compiler is unable to infer the required collection type, you can’t use a collection expression because it’s not clear whether you want an array, a List
Example 5-9. Array initializer syntax
var workingWeekDayNames = new string[]
{ "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" };
As it happens, C# has long supported one special case in which the older collection initializer syntax can omit the new string[] part. Example 5-10 shows that if you specify the type explicitly in the variable declaration, you can write just the initializer list, leaving out the new keyword. This works only in variable initializer expressions, by the way; you can’t use this syntax to create an array in other expressions, such as assignments or method arguments. (The newer collection expression syntax works in all those contexts, as does the more verbose initializer expression in Example 5-9.) There’s no practical difference between Example 5-10 and the newer syntax in Example 5-7, but since collection expressions offer additional flexibility in other contexts, and they are the most efficient way to initialize other collection types, they are likely to become the preferred idiomatic choice in these cases.
Example 5-10. Shorter array initializer syntax
string[] workingWeekDayNames =
{ "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" };
The array initializer syntax offers yet another variation: if all the expressions inside the array initializer list are of the same type, the compiler can infer the array type, so we can write just new[] without an explicit element type. Example 5-11 does this.
Example 5-11. Array initializer syntax with element type inference
var workingWeekDayNames = new[]
{ "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" };
That was actually slightly longer than Example 5-10. However, as with collection expressions and the other more verbose kinds of array initializers, this style is not limited to variable initialization. You can also use it when you need to pass an array as an argument to a method, for example. If the array you are creating will only be passed into a method and never referred to again, you may not want to declare a variable to refer to it. It might be neater to write the array directly in the argument list. Example 5-12 passes an array of strings to a method using this technique.
Example 5-12. Array as argument
SetHeaders(new[] { "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" });
In most cases, the new collection expression syntax is likely to be a better choice, but there are some circumstances where its flexibility works against it. If the SetHeaders method invoked by Example 5-12 takes a first argument of type string[], then a collection expression would be a more succinct choice. However, it would be different if its first argument’s type was IEnumerable
Tip
If you write code that initializes an array with a fixed set of contents, like some of the preceding examples, you should consider putting the array into a static readonly field. That way, the array is created just once and reused. Unless you really need a new array instance every time the code runs (e.g., because you go on to modify the array, or because the contents aren’t the same every time), or the method in question will be invoked only once, it can be more efficient to reuse the same array object each time.
Searching and Sorting
Sometimes, you will not know the index of the array element you need. For example, suppose you are writing an application that shows a list of recently used files. Each time the user opens a file in your application, you would want to bring that file to the top of the list, and you’d need to detect when the file was already in the list to avoid having it appear multiple times. If the user happened to use your recent file list to open the file, you would already know it’s in the list and at what offset. But what if the user opens the file some other way? In that case, you’ve got a filename, and you need to find out where that appears in your list, if it’s there at all.
Arrays can help you find the item you want in this kind of scenario. There are methods that examine each element in turn, stopping at the first match, and there are also methods that can work considerably faster if your array stores its elements in a particular order. To help with that, there are also methods for sorting the contents of an array into whichever order you require.
The static Array.IndexOf method provides the most straightforward way to search for an element. It does not need your array elements to be in any particular order: you just pass it the array in which to search and the value you’re looking for, and it will walk through the elements until it finds a value equal to the one you want. It returns the index at which it found the first matching element, or −1 if it reached the end of the array without finding a match. Example 5-13 shows how you might use this method as part of the logic for updating a list of recently opened files.
Example 5-13. Searching with IndexOf
int recentFileListIndex = Array.IndexOf(myRecentFiles, openedFile);
if (recentFileListIndex < 0)
{
AddNewRecentEntry(openedFile);
}
else
{
MoveExistingRecentEntryToTop(recentFileListIndex);
}
That example starts its search at the beginning of the array, but you have other options. The IndexOf method is overloaded, and you can pass an index from which to start searching and optionally a second number indicating how many elements you want it to look at before it gives up. There’s also a LastIndexOf method, which works in reverse. If you do not specify an index, it starts from the end of the array and works backward. As with IndexOf, you can provide one or two more arguments, indicating the offset at which you’d like to start and the number of elements to check.
These methods are fine if you know precisely what value you’re looking for, but often, you’ll need to be a bit more flexible: you may want to find the first (or last) element that meets some particular criteria. For example, suppose you have an array representing the bin values for a histogram. It might be useful to find out which is the first nonempty bin. So rather than searching for a particular value, you’d want to find the first element with any value other than zero. Example 5-14 shows how to use the FindIndex method to locate the first such entry.
Example 5-14. Searching with FindIndex
public static int GetIndexOfFirstNonEmptyBin(int[] bins)
=> Array.FindIndex(bins, IsNonZero);
private static bool IsNonZero(int value) => value != 0;
My IsNonZero method contains the logic that decides whether any particular element is a match, and I’ve passed that method as an argument to FindIndex. You can pass any method with a suitable signature—FindIndex requires a method that takes an instance of the array’s element type and returns a bool. (Strictly speaking, it takes a Predicate
By the way, the logic for this particular example is so simple that writing a separate method for the condition is probably overkill. For simple cases such as these, you’d almost certainly use the lambda syntax (using => to indicate that an expression represents an inline function) instead. That’s also something I’ll be discussing in Chapter 9, so this is jumping ahead, but I’ll just show how it looks because it’s more concise. Example 5-15 has exactly the same effect as Example 5-14 but doesn’t require us to declare and write a whole extra method explicitly.
Example 5-15. Using a lambda with FindIndex
public static int GetIndexOfFirstNonEmptyBin(int[] bins)
=> Array.FindIndex(bins, value => value != 0);
As with IndexOf, FindIndex provides overloads that let you specify the offset at which to start searching and the number of elements to check before giving up. The Array class also provides FindLastIndex, which works backward—it corresponds to LastIndexOf, much as FindIndex corresponds to IndexOf.
When you’re searching for an array entry that meets some particular criteria, you might not be all that interested in the index of the matching element—you might need to know only the value of the first match. Obviously, it’s pretty easy to get that: you can just use the value returned by FindIndex in conjunction with the array index syntax. However, you don’t need to, because the Array class offers Find and FindLast methods that search in precisely the same way as FindIndex and FindLastIndex but that return the first or last matching value instead of returning the index at which that value was found.
An array could contain multiple items that meet your criteria, and you might want to find all of them. You could write a loop that calls FindIndex, adding one to the index of the previous match and using that as the starting point for the next search, repeating until either reaching the end of the array or getting a result of −1, indicating that no more matches were found. And that would be the way to go if you needed to know the index of each match. But if you are interested only in knowing all of the matching values, and do not need to know exactly where those values were in the array, you could use the FindAll method shown in Example 5-16 to do all the work for you.
Example 5-16. Finding multiple items with FindAll
public static T[] GetNonNullItems<T>(T[] items) where T : class
=> Array.FindAll(items, value => value != null);
This takes any array with reference type elements and returns an array that contains only the non-null elements in that array.
All of the search methods I’ve shown so far run through an array’s elements in order, testing each element in turn. This works well enough, but with large arrays it may be unnecessarily expensive, particularly in cases where comparisons are relatively complex. Even for simple comparisons, if you need to deal with arrays with millions of elements, this sort of search can take long enough to introduce visible delays. However, we can do much better. For example, given an array of values sorted into ascending order, a binary search can perform many orders of magnitude better. Example 5-17 shows two methods. The first, Sort, sorts an array of numbers into ascending order. And if we have such a sorted array, we can then pass it to Find, which uses the Array.BinarySearch method.
Example 5-17. Sorting an array, and BinarySearch
void Sort(int[] numbers)
{
Array.Sort(numbers);
}
int Find(int[] numbers, int searchFor)
{
return Array.BinarySearch(numbers, searchFor);
}
Binary search is a widely used algorithm that exploits the fact that its input is sorted to be able to rule out half of the array at each step. It starts with the element in the middle of the array. If that happens to be the value required, it can stop, but otherwise, depending on whether the value it found is higher or lower than the value we want, it can know instantly which half of the array the value will be in (if it’s present at all). It then leaps to the middle of the remaining half, and if that’s not the right value, again it can determine which quarter will contain the target. At each step, it narrows the search down by half, and after halving the size a few times, it will be down to a single item. If that’s not the value it’s looking for, the item it wants is missing.
Tip
BinarySearch produces negative numbers when the value is not found. In these cases, this binary chop process will finish at the value nearest to the one we are looking for, and that might be useful information. So a negative number still tells us the search failed, but that number is the negation of the index of the closest match.
A binary search is more complex than a simple linear search, but with large arrays, it pays off because far fewer iterations are needed. Given an array of 100,000,000 elements, it has to perform only 27 steps instead of 100,000,000. Obviously, with smaller arrays, the improvement is reduced, and there will be some minimum size of array at which the relative complexity of a binary search outweighs the benefit. If your array contains only 10 values, a linear search may well be faster. But a binary search is certainly the clear winner with 100,000,000 int elements. The cases that require the most work are where it finds no match (producing a negative result), and in these cases, BinarySearch determines that an element is missing around 20,000 times faster than the linear search performed by Array.IndexOf does. However, you need to take care: a binary search works only for data that is already ordered, and the cost of getting your data into order could well outweigh the benefits. With an array of 100,000,000 ints, you would need to do about 500 searches before the cost of sorting was outweighed by the improved search speed, and, of course, that would work only if nothing changed in the meantime that forced you to redo the sort. With performance tuning, it’s always important to look at the whole scenario and not just the microbenchmarks.
Incidentally, Array.BinarySearch offers overloads for searching within some subsection of the array, similar to those we saw for the other search methods. It also lets you customize the comparison logic. This works with the comparison interfaces I showed in earlier chapters. By default, it will use the IComparable
There are other searching and sorting methods besides the ones provided by the Array class itself. All arrays implement IEnumerable
Multidimensional Arrays
The arrays I’ve shown so far have all been one-dimensional, but C# supports two multidimensional forms: jagged arrays and rectangular arrays.
Jagged arrays
A jagged array is simply an array of arrays. The existence of this kind of array is a natural upshot of the fact that arrays have types that are distinct from their element type. Because int[] is a type, you can use that as the element type of another array. Example 5-18 shows how to create an array of arrays of integers.
Example 5-18. Creating a jagged array with collection expressions
int[][] arrays =
[
[1, 2],
[1, 2, 3, 4, 5, 6],
[1, 2, 4],
[1],
[1, 2, 3, 4, 5]
];
I’ve used the collection expression syntax to initialize this array. As mentioned earlier, collection expressions are new in C# 12.0. Example 5-19 shows how to create an identical array with the older array initialization syntax.
Example 5-19. Creating a jagged array with array initializers
var arrays = new int[5][]
{
new[] { 1, 2 },
new[] { 1, 2, 3, 4, 5, 6 },
new[] { 1, 2, 4 },
new[] { 1 },
new[] { 1, 2, 3, 4, 5 }
};
I’ve done something that wasn’t strictly necessary here because I want to show a slightly odd feature of the jagged array constructor syntax. When using either collection expressions or array initializers, you don’t have to specify the size explicitly, because the compiler will work it out for you. I’ve exploited that for the nested arrays in Example 5-19, but I’ve set the size (5) explicitly for the outer array. I could have omitted it, but I wanted to show where the size appears, because it might not be where you would expect, particularly once you think about what the type name for an array really means.
In general, array types have the form ElementType[], so if the element type is int[], we’d expect the resulting array type to be written as int[][], and that’s what we see. The constructor syntax is a bit more peculiar. It declares an array of five arrays, and at a first glance, new int[5][] seems like a perfectly reasonable way to express that. It is consistent with array index syntax for jagged arrays; we can write arrays[1][3], which fetches the second of those five arrays and then retrieves the fourth element from that second array. This is not a specialized syntax, by the way—there is no need for special handling here, because any expression that evaluates to an array can be followed by the index in square brackets. The expression arrays[1] evaluates to an int[] array, and so we can follow that with [3].
However, the new keyword does treat jagged arrays specially. It makes them look consistent with array element access syntax, but it has to twist things a little to do that. With a one-dimensional array, the pattern for constructing a new array is new ElementType[length], so for creating an array of five things, you’d expect to write new ElementType[5]. If the things you are creating are arrays of int, wouldn’t you expect to see int[] in place of ElementType? That would imply that the syntax should be new int[][5].
That would be logical, but it looks like it’s the wrong way around, and that’s because the array type syntax itself is effectively reversed. Arrays are constructed types, like generics. With generics, the name of the generic type from which we construct the actual type comes before the type argument (e.g., List
Note
The syntax extends in the obvious way—for example, int[][][] for the type and new int[5][][] for construction. C# does not define particular limits to the number of dimensions, but there are some implementation-specific runtime limits. (Microsoft’s compiler didn’t flinch when I asked for a 4,000-dimensional jagged array, but the CLR refused to load the resulting program. In fact, it wouldn’t load anything with more than 2,816 dimensions.)
Examples 5-18 and 5-19 both initialize an array with five one-dimensional int[] arrays. The layout of the code should make it fairly clear why this sort of array is referred to as jagged: each row has a different length. With arrays of arrays, there is no requirement for a rectangular layout. I could go further. Arrays are reference types, so I could have set some rows to null. If I abandoned both the collection expression and array initializer syntaxes and initialized the array elements individually, I could have decided to make some of the one-dimensional int[] arrays appear in more than one row.
Because each row in this jagged array contains an array, I’ve ended up with six objects here—the five int[] arrays and then the int[][] array that contains references to them. If you introduce more dimensions, you’ll get yet more arrays. For certain kinds of work, the nonrectangularity and the large numbers of objects can be problematic, which is why C# supports another kind of multidimensional array.
Rectangular arrays
A rectangular array is a single array object that supports multidimensional indexing. If C# didn’t offer multidimensional arrays, we could build something a bit like them by convention. If you want an array with 10 rows and 5 columns, you could construct a one-dimensional array with 50 elements, and then use code like myArray[i + (5 * j)] to access it, where i is the column index and j is the row index. That would be an array that you had chosen to think of as being two-dimensional, even though it’s really just one big contiguous block. A rectangular array is essentially the same idea, but where C# does the work for you. Example 5-20 shows how to declare and construct rectangular arrays. (This is one scenario in which we can’t use the new collection expression syntax, because that only knows how to create one-dimensional collections. It works for jagged arrays because those are one-dimensional arrays of one-dimensional arrays: each object constructed has just a single dimension. With rectangular arrays, we have a single multidimensional object.)
Note
Rectangular arrays are not just about convenience. There’s a type safety aspect too: int[,] is a different type than int[] or int[,,], so if you write a method that expects a two-dimensional rectangular array, C# will not allow anything else to be passed.
Example 5-20. A rectangular array
int[,] grid = new int[5, 10];
var smallerGrid = new int[,]
{
{ 1, 2, 3, 4 },
{ 2, 3, 4, 5 },
{ 3, 4, 5, 6 }
};
Rectangular array type names use only a single pair of square brackets, no matter how many dimensions they have. The number of commas inside the brackets denotes the number of dimensions, so this example with one comma is two-dimensional. The runtime seems to impose a much lower limit on the number of dimensions than for a jagged array. .NET 8.0 won’t load a program that tries to use more than 32 dimensions in a rectangular array.
The multidimensional initializer syntax is very similar to the older initializer syntax for multidimensional arrays (see Example 5-19), except I do not start each row with new[], because this is one big array, not an array of arrays. The numbers in Example 5-20 form a shape that is clearly rectangular, and if you attempt to make things jagged (with different row sizes), the compiler will report an error. This extends to higher dimensions. If you wanted a three-dimensional “rectangular” array, it would need to be a cuboid. Example 5-21 shows a cuboid array. You could think of the initializer as being a list of two rectangular slices making up the cuboid. And you can go higher, with hypercuboid arrays (although they are still known as rectangular arrays, regardless of how many dimensions you use).
Example 5-21. A 2 × 3 × 5 cuboid “rectangular” array
var cuboid = new int[,,]
{
{
{ 1, 2, 3, 4, 5 },
{ 2, 3, 4, 5, 6 },
{ 3, 4, 5, 6, 7 }
},
{
{ 2, 3, 4, 5, 6 },
{ 3, 4, 5, 6, 7 },
{ 4, 5, 6, 7, 8 }
}
};
The syntax for accessing rectangular arrays is predictable enough. If the second variable from Example 5-20 is in scope, we could write smallerGrid[2, 3] to access the final item in the array; as with single-dimensional arrays, indices are zero-based, so this refers to the third row’s fourth item.
Remember that an array’s Length property returns the total number of elements in the array. Since rectangular arrays have all the elements in a single array (rather than being arrays that refer to some other arrays), this will return the product of the sizes of all the dimensions. A rectangular array with 5 rows and 10 columns would have a Length of 50, for example. If you want to discover the size along a particular dimension at runtime, use the GetLength method, which takes a single int argument indicating the dimension for which you’d like to know the size.
Copying and Resizing
Sometimes you will want to move chunks of data around in arrays. You might want to insert an item in the middle of an array, moving the items that follow it up by one position (and losing one element at the end, since array sizes are fixed). Or you might want to move data from one array to another, perhaps one of a different size.
The static Array.Copy method takes two references to arrays, along with a number indicating how many elements to copy. It offers overloads so that you can specify the positions in the two arrays at which to start the copy. (The simpler overload starts at the first element of each array.) You are allowed to pass the same array as the source and destination, and it will handle overlap correctly: the copy acts as though the elements were first all copied to a temporary location before starting to write them to the target.
Warning
As well as the static Copy method, the Array class defines a nonstatic CopyTo method, which copies the entire array into a target array, starting at the specified offset. This method is present because all arrays implement certain collection interfaces, including ICollection
Copying elements from one array to another can become necessary when you need to deal with variable amounts of data. You would typically allocate an array larger than initially necessary, and if this eventually fills up, you’ll need a new, larger array, and you’d need to copy the contents of the old array into the new one. In fact, the Array class can do this for you for one-dimensional arrays with its Resize method. The method name is slightly misleading, because arrays cannot be resized, so it allocates a new array and copies the data from the old one into it. Resize can build either a larger or a smaller array, and if you ask it for a smaller one, it will just copy as many elements as will fit.
While I’m talking about methods that copy the array’s data around, I should mention Reverse, which simply reverses the order of the array’s elements. Also, while this isn’t strictly about copying, the Array.Clear method is often useful in scenarios where you’re juggling array sizes—it allows you to reset some range of the array to its initial zero-like state.
These methods for moving data around within arrays are useful for building more flexible data structures on top of the basic services offered by arrays. But you often won’t need to use them yourself, because the runtime libraries provide several useful collection classes that do this for you.
List
The List
Although code that uses an indexer resembles array element access, it is not quite the same thing. An indexer is a kind of property, so it has the same issues with mutable value types that I discussed in Chapter 3. Given a variable pointList of type List
Unlike an array, a List
Note
List
Example 5-22 shows one way to create a List
Example 5-22. Using a List<T>
var numbers = new List<int>();
numbers.Add(123);
numbers.Add(99);
numbers.Add(42);
Console.WriteLine(numbers.Count);
Console.WriteLine($"{numbers[0]}, {numbers[1]}, {numbers[2]}");
numbers[1] += 1;
Console.WriteLine(numbers[1]);
numbers.RemoveAt(1);
Console.WriteLine(numbers.Count);
Console.WriteLine($"{numbers[0]}, {numbers[1]}");
Because a List
If you know up front that you will eventually store a specific number of elements in a list, you can pass that number to the constructor, and it will allocate exactly that much capacity, meaning that no further reallocation will be required. If you get this wrong, it won’t cause an error—you’re just requesting an initial capacity, and it’s OK to change your mind later.
If the idea of unused memory going to waste in a list offends you, but you don’t know exactly how much space will be required before you start, you could call the TrimExcess method once you know the list is complete. This reallocates the internal storage to be exactly large enough to hold the list’s current contents, eliminating waste. This will not always be a win. To ensure that it is using exactly the right amount of space, TrimExcess has to create a new array of the right size, leaving the old, oversized one to be reclaimed by the garbage collector later on, and in some scenarios, the overhead of forcing an extra allocation just to trim things down to size may be higher than the overhead of having some unused capacity.
Lists have a third constructor. Besides the default constructor, and the one that takes a capacity, you can also pass in a collection of data with which to initialize the list. You can pass any IEnumerable
You can provide initial content for lists in the same way as with an array, by writing a collection expression. Example 5-23 loads the same three values into a new list as at the start of Example 5-22.
Example 5-23. Initializing a list with a collection expression
List<int> numbers = [123, 99, 42];
As with arrays, the collection expression syntax was introduced in C# 12.0, and there is an older, slightly more verbose syntax. Example 5-24 has the same effect as Example 5-23.
Example 5-24. List initializer
var numbers = new List<int> { 123, 99, 42 };
If you’re not using var, you can omit the type name after the new keyword, as Example 5-25 shows. But in contrast to arrays, you cannot omit the new keyword entirely. I’ve used a target-typed new() expression here to avoid writing out the type name a second time.
Example 5-25. List initializer with target-typed new
List<int> numbers = new() { 123, 99, 42 };
Examples 5-24 and 5-25 are equivalent, and each compile into code that calls Add once for each item in the list. You can use this syntax with any type that has a suitable Add method and implements the IEnumerable interface. This works even if Add is an extension method. (So if some type implements IEnumerable, but does not supply an Add method, you are free to use this initializer syntax if you provide your own Add.)
Example 5-23 generates marginally more efficient code. The collection expression syntax not only recognizes the same pattern as the initializer syntax (an IEnumerable implementation and an Add method), but it also has some additional tricks. If the collection type offers a constructor with a single argument of type int named capacity, it will use that to construct the collection, passing in the number of elements that will be in the list once initialization is complete. List
List
We’ve now seen two ways to represent a list of values: arrays and lists. Fortunately, interfaces make it possible to write code that can work with either, so you won’t need to write two sets of functions if you want to support both lists and arrays.
List and Sequence Interfaces
The runtime libraries define several interfaces representing collections. Three of these are relevant to simple linear sequences of the kind you can store in an array or a list: IList
IEnumerable
Example 5-26. IEnumerable<T> and IEnumerable
public interface IEnumerable<out T> : IEnumerable
{
IEnumerator<T> GetEnumerator();
}
public interface IEnumerable
{
IEnumerator GetEnumerator();
}
Using inheritance, IEnumerable
Example 5-27. IEnumerator<T> and IEnumerator
public interface IEnumerator<out T> : IDisposable, IEnumerator
{
T Current { get; }
}
public interface IEnumerator
{
bool MoveNext();
object Current { get; }
void Reset();
}
The usage model for an IEnumerable
Note
Notice that IEnumerator
The foreach loop in C# does all of the work required to iterate through an enumerable collection for you,1 including generating code that calls Dispose even if the loop terminates early due to a break statement, an error, or, perish the thought, a goto statement. Chapter 7 will describe the uses of IDisposable in more detail.
IEnumerable
Example 5-28. IAsyncEnumerable<T> and IAsyncEnumerator<T>
public interface IAsyncEnumerable<out T>
{
IAsyncEnumerator<T> GetAsyncEnumerator(
CancellationToken cancellationToken = default);
}
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
T Current { get; }
ValueTask<bool> MoveNextAsync();
}
You can consume an IAsyncEnumerable
Although IEnumerable
Example 5-29. ICollection<T>
public interface ICollection<T> : IEnumerable<T>, IEnumerable
{
void Add(T item);
void Clear();
bool Contains(T item);
void CopyTo(T[] array, int arrayIndex);
bool Remove(T item);
int Count { get; }
bool IsReadOnly { get; }
}
This requires implementers also to provide IEnumerable
The third interface for sequential collections is IList
Example 5-30. IList<T>
public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
int IndexOf(T item);
void Insert(int index, T item);
void RemoveAt(int index);
T this[int index] { get; set; }
}
Again, although there is a nongeneric IList, this interface has no direct relationship to it, even though both interfaces represent similar concepts—the nongeneric IList has equivalents to the IList
One unfortunate upshot of the way these three generic interfaces are related is that they do not provide an abstraction representing indexed collections that are read-only, or even ones that are fixed-size. While IEnumerable
Arrays mitigate this problem by using explicit interface implementation to hide the IList
Example 5-31. Trying (and failing) to enlarge an array
IList<int> array = new[] { 1, 2, 3 };
array.Add(4); // Will throw an exception
The Add method throws a NotSupportedException, with an error message stating that the collection has a fixed size. If you inspect the documentation for IList
This causes an irritating problem: if you’re writing code that does in fact require a modifiable collection, there’s no way to advertise that fact. If a method takes an IList
There is a ReadOnlyCollection
.NET defines IReadOnlyList
Note
Collections do not need to be read-only to implement IReadOnlyList
The issues and interfaces I’ve just discussed raise a question: When writing code or classes that work with collections, what type should you use? With private implementation details (e.g., private fields, or inside methods) you will typically specify concrete types, not interfaces, because you don’t need the flexibility to deal with different kinds of collections. But when it comes to the public-facing parts of your code (e.g., properties or method arguments) you will typically get the most flexibility if your API demands the least specific type it can work with. For example, if an IEnumerable
The interfaces we’ve just examined are not the only generic collection interfaces, because simple linear lists are not the only kind of collection. But before moving on to the others, I want to show enumerables and lists from the flip side: How do we implement these interfaces?
Implementing Lists and Sequences
It is often useful to provide information in the form of either an IEnumerable
You could implement these interfaces by hand, as none of them is particularly complicated. However, C# and the runtime libraries can help. There is direct language-level support for implementing IEnumerable
Implementing IEnumerable with Iterators
C# supports a special form of method called an iterator. An iterator is a method that produces enumerable sequences using the yield keyword. Example 5-32 shows a simple iterator and some code that uses it. This will display numbers counting down from 5 to 1.
Example 5-32. A simple iterator
static IEnumerable<int> Countdown(int start, int end)
{
for (int i = start; i >= end; --i)
{
yield return i;
}
}
foreach (int i in Countdown(5, 1))
{
Console.WriteLine(i);
}
An iterator looks much like any normal method, but the way it returns values is different. The iterator in Example 5-32 has a return type of IEnumerable
Example 5-33. A very simple iterator
public static IEnumerable<int> ThreeNumbers()
{
yield return 1;
yield return 2;
yield return 3;
}
Although this is fairly straightforward in concept, the way it works is somewhat involved because code in iterators does not run in the same way as other code. Remember, with IEnumerable
Example 5-34. An infinite iterator
public static IEnumerable<BigInteger> Fibonacci()
{
BigInteger v1 = 1;
BigInteger v2 = 1;
while (true)
{
yield return v1;
BigInteger tmp = v2;
v2 = v1 + v2;
v1 = tmp;
}
}
This iterator runs indefinitely; it has a while loop with a true condition, and it contains no break statement, so this will never voluntarily stop. If C# tried to run an iterator to completion before returning anything, it would get stuck here. (The numbers grow, so if it ran for long enough, the method would eventually terminate by throwing an OutOfMemoryException.) But if you try this, you’ll find it starts returning values from the Fibonacci series immediately and will continue to do so for as long as you continue to iterate through its output. Clearly, C# is not simply running the whole method before returning.
C# performs some serious surgery on your code to make this work. If you examine the compiler’s output for an iterator using a tool such as ILDASM (the disassembler for .NET code, provided with the .NET SDK), you’ll find it generates a private nested class that acts as the implementation for both the IEnumerable
Example 5-35. Implementing IEnumerable<T> by hand
public class FibonacciEnumerable :
IEnumerable<BigInteger>, IEnumerator<BigInteger>
{
private BigInteger v1;
private BigInteger v2;
private bool first = true;
public BigInteger Current => v1;
public void Dispose() { }
object IEnumerator.Current => Current;
public bool MoveNext()
{
if (first)
{
v1 = 1;
v2 = 1;
first = false;
}
else
{
BigInteger tmp = v2;
v2 = v1 + v2;
v1 = tmp;
}
return true;
}
public void Reset()
{
first = true;
}
public IEnumerator<BigInteger> GetEnumerator() =>
new FibonacciEnumerable();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
This is not a particularly complex example, because its enumerator is essentially in either of two states—either it is running for the first time and therefore needs to run the code that comes before the loop or it is inside the loop. Even so, this code is much harder to read than Example 5-34, because the mechanics of supporting enumeration have obscured the essential logic.
The code would get even more convoluted if we needed to deal with exceptions. You can write using blocks and finally blocks, which enable your code to behave correctly in the face of errors, as I’ll show in Chapters 7 and 8, and the compiler can end up doing a lot of work to preserve the correct semantics for these when the method’s execution is split up over multiple iterations.2 You wouldn’t need to write too many enumerations by hand this way before being grateful that C# can do it for you.
Iterator methods don’t have to return an IEnumerable
One thing to be careful of with iterators is that they run very little code until the first time the caller calls MoveNext. So if you were to single-step through code that calls the Fibonacci method in Example 5-34, the method call would appear not to do anything at all. If you try to step into the method at the point at which it’s invoked, none of the code in the method runs. It’s only when iteration begins that you’d see your iterator’s body execute. This has a couple of consequences.
The first thing to bear in mind is that if your iterator method takes arguments, and you want to validate those arguments, you may need to do some extra work. By default, the validation won’t happen until iteration begins, so errors will occur later than you might expect. If you want to validate arguments immediately, you will need to write a wrapper. Example 5-36 shows an example—it provides a normal method called Fibonacci that doesn’t use yield return and will therefore not get the special compiler behavior for iterators. This normal method validates its argument before going on to call a nested iterator method. (This also illustrates that local methods can use yield return.)
Example 5-36. Iterator argument validation
public static IEnumerable<BigInteger> Fibonacci(int count)
{
ArgumentOutOfRangeException.ThrowIfNegative(count);
return Core(count);
static IEnumerable<BigInteger> Core(int count)
{
BigInteger v1 = 1;
BigInteger v2 = 1;
for (int i = 0; i < count; ++i)
{
yield return v1;
BigInteger tmp = v2;
v2 = v1 + v2;
v1 = tmp;
}
}
}
The second thing to remember is that iterators may execute several times. IEnumerable
Collection
If you look at types in the runtime libraries, you’ll find that when they offer properties that expose an implementation of IList
The runtime libraries provide a Collection
You typically derive a class from Collection
ReadOnlyCollection
If you want to provide a nonmodifiable collection, then instead of using Collection
If your collection’s element type is a reference type, making the collection read-only does not prevent the objects to which the elements refer from being modified. I can retrieve, say, the 12th element from a read-only collection, and it will hand me back a reference. Fetching a reference counts as a read-only operation, but now that I have got that reference, the collection object is out of the picture, and I am free to do whatever I like with the reference. C# does not offer any concept of a read-only reference for reference types,3 so the only way to present a truly read-only collection is to use an immutable type in conjunction with an IReadOnlyCollection
There are two ways to use ReadOnlyCollection
Addressing Elements with Index and Range Syntax
Whether using arrays, List
Example 5-37. Accessing the last element of an array with an end-relative index
char[] letters = ['a', 'b', 'c', 'd'];
char lastLetter = letters[^1];
This demonstrates one of two operators for use in indexers: the ^ operator. The other, shown in Example 5-38, is the range operator. It consists of a pair of periods (..), and it is used to identify subranges of arrays, strings, or any indexable type that implements a certain pattern. (Although this resembles the spread syntax we saw earlier, and also the slice syntax shown in Chapter 2, those appear only in collection expressions and list patterns, respectively. This range syntax is something else.)
Example 5-38. Getting a subrange of an array with the range operator
int[] numbers = [1, 2, 3, 4, 5, 6, 7];
// Gets 4th and 5th (but not the 3rd or 6th, for reasons explained shortly)
int[] theFourthTheFifth = **numbers[3..5];**
Expressions using the ^ and .. operators are of type Index and Range, respectively. These types are available in .NET but not .NET Framework, meaning that you can only use these language features on newer runtimes.
System.Index
You can put the ^ operator in front of any int expression. It produces a System.Index, a value type that represents a position. When you create an index with ^, it is end-relative, but you can also create start-relative indexes. There’s no special operator for that, but since Index offers an implicit conversion from int, you can just assign int values directly into variables of type Index, as Example 5-39 shows. You can also explicitly construct an index, as the line with var shows. The final bool argument is optional—it defaults to false—but I’m showing it to illustrate how Index knows which kind you want.
Example 5-39. Some start-relative and end-relative Index values
Index first = 0;
Index second = 1;
Index third = 2;
var fourth = new Index(3, fromEnd: false);
Index antePenultimate = ^3;
Index penultimate = ^2;
Index last = ^1;
Index directlyAfterTheLast = ^0;
As Example 5-39 shows, end-relative indexes exist independently of any particular collection. (Internally, Index stores end-relative indexes as negative numbers. This means that an Index is the same size as an int. It also means that negative start- or end-relative values are illegal—you’ll get an exception if you try to create one.) C# generates code that determines the actual element position when you use an index. If small and big are arrays with 3 and 30 elements, respectively, small[last] would return the third, and big[last] would return the 30th. C# will turn these into small[last.GetOffset(small.Length)] and big[last.GetOffset(big.Length)], respectively.
It has often been said that three of the hardest problems in computing are picking names for things and off-by-one errors. At first glance, Example 5-39 makes it look like Index might be contributing to these problems. It may be vexing that the index for the third item is two, not three, but that at least is consistent with how arrays have always worked in C# and is normal for any zero-based indexing system. But given that zero-based convention, why on earth do the end-relative indexes appear to be one-based? We denote the first element with 0 but the last element with ^1!
There are some good reasons for this. The fundamental insight is that in C#, indexes have always specified distances. When programming language designers choose a zero-based indexing system, this is not really a decision to call the first element 0: it is a decision to interpret an index as a distance from the start of an array. An upshot of this is that an index doesn’t really refer to an item. Figure 5-1 shows a collection with four elements and indicates where various index values would point in that collection. Notice that the indexes all refer to the boundaries between the items. This may seem like splitting hairs, but it’s the key to understanding all zero-based index systems, and it is behind the apparent inconsistency in Example 5-39.
Figure 5-1. Where Index values point
When you access an element of a collection by index, you are asking for the element that starts at the position indicated by the index. So array[0] retrieves the single element that starts at the beginning of the array, the element that fills the space between indexes 0 and 1. Likewise, array[1] retrieves the element between indexes 1 and 2. What would array[^0] mean?4 That would be an attempt to fetch the element that starts at the very end of the array. Since elements all take up a certain amount of space, an element that starts at the very end of the array would necessarily finish one position after the end of the array. In this four-element array, array[^0] is equivalent to array[4], so we’re asking for the element occupying the space that starts four elements from the start and that ends five elements from the start. And since this is a four-element array, that’s obviously not going to work.
The apparent discrepancy—the fact that array[0] gets the first, but we need to write array[^1] to get the last—occurs because elements sit between two indexes, and array indexers always retrieve the element between the index specified and the index after that. The fact that they do this even when you’ve specified an end-relative index is the reason those appear to be one-based. This language feature could have been designed differently: you could imagine a rule in which end-relative indexes always access the element that ends at the specified distance from the end and that starts one position earlier than that. There would have been a pleasing symmetry to this, because it would have made array[^0] refer to the final element, but this would have caused more problems than it solved.
It would be confusing to have indexers interpret an index in two different ways—it would mean that two different indexes might refer to the same position and yet fetch different elements. In any case, C# developers have long been used to things working this way. As Example 5-40 shows, the way to access the final element of an array before the ^ index operator was added was to use an index calculated by subtracting one from the length. And if you wanted the element before last, you subtracted two from the length, and so on. As you can see, the new end-relative syntax is entirely consistent with the older approach.
Example 5-40. End-relative indexing without and with Index
int lastOld = numbers[numbers.Length - 1];
int lastNew = numbers[^1];
int penultimateOld = numbers[numbers.Length - 2];
int penultimateNew = numbers[^2];
One more way to think of this is to wonder what it might look like if we accessed arrays by specifying ranges. The first element is in the range 0–1, and the last is in the range 1–0. Expressed this way, there is clearly symmetry between the start-relative and end-relative forms. And speaking of ranges…
System.Range
As I said earlier, C# has two operators useful for working with arrays and other indexable types. We’ve just looked at ^ and the corresponding Index type. The other is called the range operator, and it has a corresponding type, Range, also in the System namespace. A Range is a pair of Index values, which it makes available through Start and End properties. Range offers a constructor taking two Index values, but in C# the idiomatic way to create one is with the range operator, as Example 5-41 shows.
Example 5-41. Various ranges
Range everything = 0..^0;
Range alsoEverything = 0..;
Range everythingAgain = ..^0;
Range everythingOneMoreTime = ..;
var yetAnotherWayToSayEverything = Range.All;
Range firstThreeItems = 0..3;
Range alsoFirstThreeItems = ..3;
Range allButTheFirstThree = 3..^0;
Range alsoAllButTheFirstThree = 3..;
Range allButTheLastThree = 0..^3;
Range alsoAllButTheLastThree = ..^3;
Range lastThreeItems = ^3..^0;
Range alsoLastThreeItems = ^3..;
As you can see, if you do not put a start index before the .., it defaults to 0, and if you omit the end, it defaults to ^0 (i.e., the very start and end, respectively). The example also shows that the start can be either start-relative or end-relative, as can the end.
Warning
The default value for Range—the one you’ll get in a field or array element that you do not explicitly initialize—is 0..0. This denotes an empty range. While this is a natural upshot of the fact that value types are always initialized to zero-like values by default, it might not be what you’d expect given that .. is equivalent to Range.All.
Since Range works in terms of Index, the start and end denote offsets, not elements. For example, consider what the range 1..3 would mean for the elements shown in Figure 5-1. In this case, both indexes are start-relative. The start index, 1, is the boundary between the first and second elements (a and b), and the end index, 3, is the boundary between the third and fourth elements (c and d). So this is a range that starts at the beginning of b and ends at the end of c, as Figure 5-2 shows. So this identifies a two-element range (b and c).
Figure 5-2. A range
The interpretation of ranges sometimes surprises people when they first see it: some expect 1..3 to represent the first, second, and third elements (or, if they take into account C#’s zero-based indexing, perhaps the second, third, and fourth elements). It can seem inconsistent at first that the start index appears to be inclusive while the end index is exclusive. But once you remember that an index refers not to an item but to an offset, and therefore the boundary between two items, it all makes sense. If you draw the positions represented by a range’s indexes, as Figure 5-2 does, it becomes perfectly obvious that the range identified by 1..3 covers just two elements.
So what can we do with a Range? As Example 5-38 showed, we can use one to get a subrange of an array. That creates a new array of the relevant size and copies the values from the range into it. This same syntax also works for getting substrings, as Example 5-42 shows.
Example 5-42. Getting a substring with a range
string t1 = "dysfunctional";
string t2 = t1[3..6];
Console.WriteLine($"Putting the {t2} in {t1}");
You can also use Range with ArraySegment
Example 5-43. Getting a subrange of an ArraySegment<T> with the range operator
int[] numbers = [1, 2, 3, 4, 5, 6, 7];
ArraySegment<int> wholeArrayAsSegment = numbers;
ArraySegment<int> theFourthTheFifth = wholeArrayAsSegment[3..5];
The ArraySegment
Example 5-44. Getting a subrange of a span with the range operator
int[] numbers = [1, 2, 3, 4, 5, 6, 7];
Span<int> wholeArrayAsSpan = numbers;
Span<int> theFourthTheFifth = wholeArrayAsSpan[3..5];
ReadOnlySpan<char> textSpan = "dysfunctional";
ReadOnlySpan<char> such = textSpan[3..6];
These last two examples have much the same logical meaning as the preceding examples, but they avoid making copies of the underlying data.
We’ve now seen that we can use ranges with several types: arrays, strings, ArraySegment
Supporting Index and Range in Your Own Types
The array type does not define an indexer that accepts an argument of type Index. Nor do any of the generic array-like types shown earlier in this chapter—they all just have ordinary int-based indexers; however, you can use Index with them nonetheless. And as I explained earlier, code of the form a[index] will expand to a[index.GetOffset(a.Length)].5 So all you need is an int-based indexer and a property of type int called either Length or Count. Example 5-45 shows about the least amount of work you can possibly do to enable code to pass an Index to your type’s indexer. It’s not a very useful implementation, but it’s enough to keep C# happy.
Example 5-45. Minimally enabling Index
public class Indexable
{
public char this[int index] => (char)('0' + index);
public int Length => 10;
}
There’s an even simpler way: just define an indexer that takes an argument of type Index. However, most indexable types supply an int-based indexer, so in practice you’d be overloading your indexer, offering both forms. That is not simpler, but it would enable your code to distinguish between start- and end-relative indexes. If we use either 1 or ^9 with Example 5-45, its indexer sees 1 in either case, because C# generates code that converts the Index to a start-based int, but if you write an indexer with an Index parameter, C# will pass the Index straight in. If you overload the indexer so that both int and Index forms are available, it will never generate code that converts an Index to an int in order to call the int indexer: the pattern only kicks in if no Index-specific indexer is available.
IList
The pattern for supporting Range is different: if your type supplies an instance method called Slice that takes two integer arguments, C# will allow code to supply a Range as an indexer argument. Example 5-46 shows the least a type can do to enable this, although it’s not a very useful implementation. (As with Index, you can alternatively just define an indexer overload that accepts a Range directly. But again, an advantage to the pattern approach is that you can use it when targeting older versions that do not offer the Range or Index types—such as .NET Standard 2.0—while still supporting ranges for code that targets newer versions.)
Example 5-46. Minimally enabling Range
public class Rangeable
{
public int Length => 10;
public Rangeable Slice(int offset, int length) => this;
}
You might have noticed that this type doesn’t define an indexer. That’s because this pattern-based support for expressions of the form x[1..^1] doesn’t need one. It may look like that expression uses an indexer, but this just calls the Slice method. (Likewise, the earlier range examples with string and arrays compile into method calls.) You need the Length property (or Count) because the compiler generates code that relies on this to resolve the range’s indexes. Example 5-47 shows roughly how the compiler uses types that support this pattern.
Example 5-47. How range indexing expands
Rangeable r1 = new();
Range r = 2..^2;
Rangeable r2;
r2 = r1[r];
// is equivalent to
int startIndex = r.Start.GetOffset(r1.Length);
int endIndex = r.End.GetOffset(r1.Length);
r2 = r1.Slice(startIndex, endIndex - startIndex);
So far, all of the collections we’ve looked at have been linear: I’ve shown only simple sequences of objects or values, some of which offer indexed access. However, .NET provides other kinds of collections.
Dictionaries
One of the most useful kinds of collections is a dictionary. .NET offers the Dictionary<TKey, TValue> class, and there’s a corresponding interface called, predictably, IDictionary<TKey, TValue>, and also a read-only version, IReadOnlyDictionary<TKey, TValue>. These represent collections of key/value pairs, and their most important capability is to look up a value based on its key, making dictionaries useful for representing associations.
Suppose you are writing a UI for an application that supports online discussions. When displaying a message, you might want to show certain things about the user who sent it, such as their name and picture, and you’d probably want to avoid fetching these details from a persistent store every time; if the user is in conversation with a few friends, the same people will crop up repeatedly, so you’d want some sort of cache to avoid duplicate lookups. You might use a dictionary as part of this cache. Example 5-48 shows an outline of this approach. (It omits application-specific details of how the data is actually fetched and when old data is removed from memory.)
Example 5-48. Using a dictionary as part of a cache
public class UserCache
{
private readonly Dictionary<string, UserInfo> _cachedUserInfo = new();
public UserInfo GetInfo(string userHandle)
{
RemoveStaleCacheEntries();
if (!_cachedUserInfo.TryGetValue(userHandle, out UserInfo? info))
{
info = FetchUserInfo(userHandle);
_cachedUserInfo.Add(userHandle, info);
}
return info;
}
private UserInfo FetchUserInfo(string userHandle)
{
// fetch info...
}
private void RemoveStaleCacheEntries()
{
// application-specific logic deciding when to remove old entries...
}
}
public class UserInfo
{
// application-specific user information...
}
The first type argument, TKey, is used for lookups, and in this example, I’m using a string that identifies the user in some way. The TValue argument is the type of value associated with the key—information previously fetched for the user and cached locally in a UserInfo instance, in this case. The GetInfo method uses TryGetValue to look in the dictionary for the data associated with a user handle. There is a simpler way to retrieve a value. As Example 5-49 shows, dictionaries provide an indexer. However, that throws a KeyNotFoundException if there is no entry with the specified key. That would be fine if your code always expects to find what it’s looking for, but in our case, the key will be missing for any user whose data is not already in the cache. This will probably happen rather a lot, which is why I’m using TryGetValue. As an alternative, we could have used the ContainsKey method to see if the entry exists before retrieving it, but that’s inefficient if the value is present—the dictionary would end up looking up the entry twice, once in the call to ContainsKey and then again when we use the indexer. TryGetValue performs the test and the lookup as a single operation.
Example 5-49. Dictionary lookup with indexer
UserInfo info = _cachedUserInfo[userHandle];
As you might expect, we can also use the indexer to set the value associated with a key. I’ve not done that in Example 5-48. Instead, I’ve used the Add method, because it has subtly different semantics: by calling Add, you are indicating that you do not think any entry with the specified key already exists. Whereas the dictionary’s indexer will silently overwrite an existing entry if there is one, Add will throw an exception if you attempt to use a key for which an entry already exists. In situations where the presence of an existing key would imply that something is wrong, it’s better to call Add so that the problem doesn’t go undetected.
The IDictionary<TKey, TValue> interface requires its implementations also to provide the ICollection<KeyValuePair<TKey, TValue>> interface, and therefore also IEnumerable<KeyValuePair<TKey, TValue>>. The read-only counterpart requires the latter but not the former. These interfaces depend on a generic struct, KeyValuePair<TKey, TValue>, which is a simple container that wraps a key and a value in a single instance. This means you can iterate through a dictionary using foreach, and it will return each key/value pair in turn.
The presence of an IEnumerable
Example 5-50. Collection initializer syntax with a dictionary
var textToNumber = new Dictionary<string, int>
{
{ "One", 1 },
{ "Two", 2 },
{ "Three", 3 },
};
As you saw in Chapter 3, there’s an alternative way to populate a dictionary: instead of using a collection initializer, you can use the object initializer syntax. As you may recall, this syntax lets you set properties on a newly created object. Indexers are just a special kind of property, so it makes sense to be able to set them with an object initializer. Although Chapter 3 showed this already, it’s worth comparing object initializers with collection initializers, so Example 5-51 shows the alternative way to initialize a dictionary.
Example 5-51. Object initializer syntax with a dictionary
var textToNumber = new Dictionary<string, int>
{
["One"] = 1,
["Two"] = 2,
["Three"] = 3
};
Although the outcome is the same here with Examples 5-50 and 5-51, the compiler generates slightly different code for each. With Example 5-50, it populates the collection by calling Add, whereas Example 5-51 uses the indexer. In these examples the result is the same, so there’s no objective reason to choose one over the other, but the difference could matter in some cases. For example, if you are using a class that has an indexer but no Add method, only the index-based code would work. And even with Dictionary<TKey, TValue>, which offers both, Add throws an exception if you supply duplicate keys, whereas the indexer does not, so the collection initializer syntax has the benefit of checking that your keys are unique.
Note
You can use the collection expression syntax added in C# 12.0 to initialize an empty dictionary, but it does not provide a way to add entries. Currently, only the older initializer syntax can do this.
The Dictionary<TKey, TValue> collection class relies on hashes to offer fast lookup. Chapter 3 described the GetHashCode method, and you should ensure that whatever type you are using as a key provides a good hash implementation. The string class overrides GetHashCode so that it works well as a key, for example. The default GetHashCode method is viable only if different instances of a type are always considered to have different values, but types for which that is true function well as keys. Alternatively, the dictionary class provides constructors that accept an IEqualityComparer
Example 5-52. A case-insensitive dictionary
var textToNumber =
new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase)
{
{ "One", 1 },
{ "Two", 2 },
{ "Three", 3 },
};
This uses the StringComparer class, which provides various implementations of IComparer
Sorted Dictionaries
Dictionary<TKey, TValue> does not guarantee to provide elements in any particular order when you iterate over its contents. In simple cases, it might return items in the order in which they were added, but you should not depend on this.
Sometimes, it’s useful to be able to retrieve the contents of a dictionary in some meaningful order. You could always get the contents into an array and then sort them, but the System.Collections.Generic namespace contains two more implementations of the IDictionary<TKey, TValue> interface that keep their contents permanently in order. There’s SortedDictionary<TKey, TValue> and the more confusingly titled SortedList<TKey, TValue>, which—despite the name—implements the IDictionary<TKey, TValue> interface and does not directly implement IList
These classes do not use hash codes. They still provide reasonably fast lookup, but they do it by keeping their contents sorted. They maintain the order every time you add a new entry, which makes addition rather slower for both these classes than with the hash-based dictionary, but it means that when you iterate over the contents, they come out in order. As with array and list sorting, you can specify custom comparison logic, but if you don’t supply that, these dictionaries require the key type to implement IComparable
The ordering maintained by a SortedDictionary<TKey, TValue> is apparent only when you use its enumeration support (e.g., with foreach). SortedList<TKey, TValue> also enumerates its contents in order, but it additionally provides numerically indexed access to the keys and values. This does not work through the object’s indexer—that expects to be passed a key just like any dictionary. Instead, the sorted list dictionary defines two properties, Keys and Values, which provide all the keys and values as IList
Inserting and removing objects is relatively expensive for the sorted list because it has to shuffle the key and value list contents up or down. (This means a single insertion has O(n) complexity.) The sorted dictionary, on the other hand, uses a tree data structure to keep its contents sorted. The exact details are not specified, but insertion and removal performance are documented as having O(log n) complexity, which is much better than for the sorted list.6 However, this more complex data structure gives a sorted dictionary a significantly larger memory footprint. This means that neither is definitively faster or better than the other—it all depends on the usage pattern, which is why the runtime libraries supply both.
In most cases, the hash-based Dictionary<TKey, Value> will provide better insertion, removal, and lookup performance than either of the sorted dictionaries, and much lower memory consumption than a SortedDictionary<TKey, TValue>, so you should use these sorted dictionary collections only if you need to access the dictionary’s contents in order.
Sets
The System.Collections.Generic namespace defines an ISet
All set types implement ICollection
Given that ICollection
Example 5-53. Using a set to determine what’s new
public static void ShowEachDistinctString(IEnumerable<string> strings)
{
var shown = new HashSet<string>(); // Implements ISet<T>
foreach (string s in strings)
{
if (shown.Add(s))
{
Console.WriteLine(s);
}
}
}
ISet
There are also some methods for comparing sets. Again, these all take an IEnumerable
Mathematical sets do not define an order for their contents, so it’s not meaningful to refer to the 1st, 10th, or nth element of a set—you can ask only whether an element is in the set or not. In keeping with this feature of mathematical sets, .NET sets do not support indexed access, so ISet
The runtime libraries offer classes that provide this interface with different implementation strategies, including: HashSet and SortedSet. As you may have guessed from the names, one of these does in fact choose to keep its elements in order; SortedSet keeps its contents sorted at all times and presents items in this order through its IEnumerable
HashSet works more like Dictionary<TKey, TValue>. It uses hash-based lookup, which can often be faster than the ordered approach, but if you enumerate through the collection with foreach, the results will not be in any useful order. (So the relationship between HashSet and SortedSet is much like that between the hash-based dictionary and the sorted dictionaries.)
Queues and Stacks
A queue is a list where you can only add items to the end of the list, and you can only remove the first item (at which point the second item, if there was one, becomes the new first item). This style of list is often called a first-in, first-out (FIFO) list. This makes it less useful than a List
To add a new item to the end of a queue, call the Enqueue method. To remove the item at the head of the queue, call Dequeue, or use Peek if you want to look at the item without removing it. Both operations will throw an InvalidOperationException if the queue is empty. You can find out how many items are in the queue with the Count property.
Although you cannot insert, remove, or change items in the middle of the list, you can inspect the whole queue, because Queue
A stack is similar to a queue, except you retrieve items from the same end as you insert them—so this is a last-in, first-out (LIFO) list. Stack
The runtime libraries do not offer a double-ended queue. However, linked lists can offer a superset of that functionality.
Linked Lists
The LinkedList
The first and last nodes in a LinkedList
The list also provides RemoveFirst and RemoveLast methods and two overloads of a Remove method that allow you to remove either the first node that has a specified value or a particular LinkedListNode
The LinkedListNode
To iterate through the contents of a linked list, you could retrieve the first node from the First property and then follow each node’s Next property until you get a null. However, LinkedList
Concurrent Collections
The collection classes described so far are designed for single-threaded usage. You are free to use different instances on different threads simultaneously, but a particular instance of any of these types must be used only from one thread at any one time.7 But some types are designed to be used by many threads simultaneously, without needing to use the synchronization mechanisms discussed in Chapter 16. These are in the System.Collections.Concurrent namespace.
The concurrent collections do not offer equivalents for every nonconcurrent collection type. Some classes are designed to solve specific concurrent programming problems. Even with the ones that do have nonconcurrent counterparts, the need for concurrent use without locking can mean that they present a somewhat different API than any of the normal collection classes.
The ConcurrentQueue
ConcurrentDictionary<TKey, TValue> looks fairly similar to its nonconcurrent cousin, but it adds some extra methods to provide the atomicity required in a concurrent world. For example, the GetOrAdd method combines the test for the presence of a key with the addition of a new entry as a single atomic operation. It returns the existing value if there is one, and it can invoke a function to get the new value when required.
There is no concurrent list, because you tend to need more coarse-grained synchronization to use ordered, indexed lists successfully in a concurrent world. But if you just want a bunch of objects, there’s ConcurrentBag
There’s also BlockingCollection
Immutable Collections
Microsoft provides a set of collection classes that guarantee immutability and yet provide a lightweight way to produce a modified version of the collection without having to make an entire new copy. (These are built into .NET, but in .NET Framework, you will need a reference to the System.Collections.Immutable NuGet package to use these.)
Immutability can be a very useful characteristic in multithreaded environments, because if you know that the data you are working with cannot change, you don’t need to take special precautions to synchronize your access to it. (This is a stronger guarantee than you get with IReadOnlyList
A low-tech approach is to build a new copy of all of your data each time something changes (e.g., when you want to add an item to a collection, create a whole new collection with a copy of all the old elements and also the new one, and use that new collection from then on). The built-in string type works in exactly the same way because it is also immutable—the methods that sound like they will change the value, such as Trim, actually return a new string. This works but can be extremely inefficient when working with large quantities of data. However, techniques exist that can effectively reuse parts of existing collections. The basic principle is that if you want to add an item to a collection, you build a new collection that just points to the data that is already there, along with some extra information to say what has changed. It is rather more complex in practice, but the key point is that there are well-established ways in which to implement various kinds of collections so that you can efficiently build what look like complete self-contained copies of the original data with some small modification applied, without either having to modify the original data or having to build a complete new copy of the collection. The immutable collections do all this for you, encapsulating the work behind some straightforward interfaces.
This enables a model where you’re free to update your application’s model without affecting code that was in the middle of using the current version of the data. Consequently, you don’t need to hold locks while reading data—you might need some synchronization when getting the latest version of the data, but thereafter, you can process the data without any concurrency concerns. This can be especially useful when writing multithreaded code. The .NET Compiler Platform (often known by its codename, Roslyn) that is the basis of Microsoft’s C# compiler uses this technique to enable compilation to exploit multiple CPU cores efficiently.
The System.Collections.Immutable namespace defines its own interfaces—IImmutableList
Example 5-54. Creating immutable dictionaries
IImmutableDictionary<int, string> d = ImmutableDictionary.Create<int, string>();
d = d.Add(1, "One");
d = d.Add(2, "Two");
d = d.Add(3, "Three");
The whole point of immutable types is that code using an existing object can be certain that nothing will change, so additions, removals, or modifications necessarily mean creating a new object that looks just like the old one but with the modification applied. So in Example 5-54, the variable d refers successively to four different immutable dictionaries: an empty one, one with one value, one with two values, and finally one with all three values.
If you are adding a range of values like this, and you won’t be making intermediate results available to other code, it is more efficient to add multiple values in a single operation, because it doesn’t have to produce a separate IImmutableDictionary<TKey, TValue> for each entry you add. (You could think of immutable collections as working a bit like a source control system, with each change corresponding to a commit—for every commit you do, a version of the collection will exist that represents its contents immediately after that change.) It’s more efficient to batch a bunch of related changes into a single “version” so the collections all have AddRange methods that let you add multiple items in one step.
When you’re building a new collection from scratch, the same principle applies: it will be more efficient if you put all of the initial content into the first version of the collection, instead of adding items one at a time. Each immutable collection type offers a nested Builder class to make this easier, enabling you to add items one at a time but to defer the creation of the actual collection until you have finished. Example 5-55 shows how this is done.
Example 5-55. Creating an immutable dictionary with a builder
ImmutableDictionary<int, string>.Builder b =
ImmutableDictionary.CreateBuilder<int, string>();
b.Add(1, "One");
b.Add(2, "Two");
b.Add(3, "Three");
IImmutableDictionary<int, string> d = b.ToImmutable();
The builder object is not immutable. Much like StringBuilder, it is a mutable object that provides an efficient way to build a description of an immutable object.
All of the immutable collection types except for dictionaries support the collection expression syntax introduced in C# 12.0. Example 5-56 uses this to create an immutable list.
Example 5-56. Creating an immutable list with a collection expression
ImmutableList<int> numbers = [1, 2, 3, 4, 5];
In addition to the immutable list, dictionary, queue, stack, and set types, there’s one more immutable collection class that is a bit different than the rest: ImmutableArray
When you call Add on an immutable list, it will attempt to reuse most of the data that is already there, so if you have a million items in your list, the “new” list returned by Add won’t contain a new copy of those items—it will mostly reuse the data that was already there. However, to achieve this, ImmutableList
An ImmutableArray
Frozen Collections
.NET 8.0 adds a new namespace, System.Collection.Frozen, which defines FrozenDictionary<TKey, TValue> and FrozenSet
Despite the name, the immutable collection types are designed to accommodate change; they just do this by providing snapshots, and there are overheads involved in enabling change while making each individual snapshot immutable. That makes these types suboptimal in cases where there will be no further changes. (ImmutableArray
The runtime libraries define extension methods for creating instances of the frozen collection types from other collections. Example 5-57 shows how to obtain a frozen dictionary from a normal one. There’s a similar ToFrozenSet extension method that can convert any IEnumerable
Example 5-57. Creating a FrozenDictionary<TKey, TValue> from a Dictionary<TKey, TValue>
Dictionary<int, string> ordinary = GetOrdinaryDictionary();
FrozenDictionary<int, string> frozen = ordinary.ToFrozenDictionary();
FrozenDictionary implements IDictionary<TKey, TValue> and IReadOnlyDictionary<TKey, TValue>, so once you’ve created one you can use it like a normal dictionary. (It implements the mutable IDictionary<TKey, TValue> because some libraries require that interface even when they won’t actually modify the dictionary, but its implementation will throw a NotSupportedException if you try to modify the collection.)
The frozen collection types won’t always be faster. In scenarios where they are a good fit—when you can supply the entire contents up front—they will usually outperform the immutable collections, but they won’t necessarily outperform the ordinary collection types such as Dictionary<TKey, TValue>. The frozen collections do extra work during initialization in exchange for making lookups faster, so they only offer an improvement if you perform enough lookups to offset that initial cost. As with any performance-oriented change you should always measure the performance impact of introducing the frozen collections. The number of lookups needed to make these worthwhile will depend on the collection size, and also the key and element types, so there are no fixed rules about when the frozen collections will offer a benefit, but to give you a rough idea, I ran some tests using dictionaries with 100,000 elements. (You should take this with a grain of salt because all of the factors I just mentioned can make significant differences.) Lookups in this particular test were around 25% faster with the frozen dictionary than with Dictionary<TKey, TValue>, but it took over two million lookups to offset the higher initialization costs.
Summary
In this chapter, we saw the intrinsic support for arrays offered by the runtime and also the various collection classes that .NET provides when you need more than a fixed-size list of items. Next, we’ll look at a more advanced topic: inheritance.
1 Surprisingly, foreach doesn’t require any particular interface; it will use anything with a GetEnumerator method that returns an object providing a MoveNext method and a Current property. Before generics, this was the only way to enable iteration through collections of value-typed elements without boxing every item. Chapter 7 describes boxing. Even though generics have fixed that, non-interface-based enumeration continues to be useful because it enables collection classes to provide an extra GetEnumerator that returns a struct, avoiding an additional heap allocation when the foreach loop starts. List
2 Some of this cleanup work happens in the call to Dispose. Remember, IEnumerator
3 You might be wondering about ref readonly, described in Chapter 3, but that’s a different kind of reference. A ref readonly to some reference-type variable only means you can’t change the variable; it won’t stop you making changes to whatever that variable refers to.
4 Since end-relative indexes are stored as negative numbers, you might be wondering whether ^0 is even legal, given that the int type does not distinguish between positive and negative zero. It is allowed because, as you’ll soon see, ^0 is useful when using ranges, so Index is able to make the distinction.
5 In cases where you use ^ directly against an int inside an array indexer (e.g., a[^i] where i is an int), the compiler generates marginally simpler code. Instead of converting i to an Index, then calling GetOffset, it will generate code equivalent to a[a.Length - i].
6 The usual complexity analysis caveats apply—for small collections, the simpler data structure might well win, with the sorted list’s theoretical advantage only coming into effect with larger collections.
7 There’s an exception to this rule: you can use a collection from multiple threads as long as none of the threads attempts to modify it.