Programming C# 12

Chapter 10. LINQ

Language Integrated Query (LINQ) is a powerful collection of C# language features for working with sets of information. It is useful in any application that needs to work with multiple pieces of data (i.e., almost any application). Although one of its original goals was to provide straightforward access to relational databases, LINQ is applicable to many kinds of information. For example, it can also be used with in-memory object models, HTTP-based information services, JSON, and XML documents. And as we’ll see in Chapter 11, it can work with live streams of data too.

LINQ is not a single feature. It relies on several language elements that work together. The most conspicuous LINQ-related language feature is the query expression, a form of expression that loosely resembles a database query but that can be used to perform queries against any supported source, including plain old objects. As you’ll see, query expressions rely heavily on some other language features such as lambdas, extension methods, and expression object models.

Language support is only half the story. LINQ needs class libraries to implement a set of querying primitives called LINQ operators. Each different kind of data requires its own implementation, and a set of operators for any particular type of information is referred to as a LINQ provider. (These can also be used from Visual Basic and F#, by the way, because those languages support LINQ too.) Microsoft supplies several providers, some built into the runtime libraries and some available as separate NuGet packages. There is a provider for Entity Framework Core (EF Core) for example, an object/relational mapping system for working with databases. The Cosmos DB cloud database (a feature of Microsoft Azure) offers a LINQ provider. And the Reactive Extensions for .NET (Rx) described in Chapter 11 provide LINQ support for live streams of data. In short, LINQ is a widely supported idiom in .NET, and it’s extensible, so you will also find open source and other third-party providers.

Most of the examples in this chapter use LINQ to Objects. This is partly because it avoids cluttering the examples with extraneous details such as database or service connections, but there’s a more important reason. LINQ’s introduction in 2007 significantly changed the way I write C#, and that’s entirely because of LINQ to Objects. Although LINQ’s query syntax makes it look like it’s primarily a data access technology, I have found it to be far more valuable than that. Having LINQ’s services available on any collection of objects makes it useful in every part of your code.

Query Expressions

The most visible feature of LINQ is the query expression syntax. It’s not the most important—as we’ll see later, it’s entirely possible to use LINQ productively without ever writing a query expression. However, it’s a very natural syntax for many kinds of queries.

At first glance, a query expression loosely resembles a relational database query, but the syntax works with any LINQ provider. Example 10-1 shows a query expression that uses LINQ to Objects to search for certain CultureInfo objects. (A CultureInfo object provides a set of culture-specific information, such as the symbol used for the local currency, what language is spoken, and so on. Some systems call this a locale.) This particular query looks at the character that denotes what would, in English, be called the decimal point. Many countries actually use a comma instead of a period, and in those countries, 100,000 would mean the number 100 written out to three decimal places; in English-speaking cultures, we would normally write this as 100.000. The query expression searches all the cultures known to the system and returns those that use a comma as the decimal separator.

Example 10-1. A LINQ query expression

**IEnumerable<CultureInfo> commaCultures =**
    **from culture in CultureInfo.GetCultures(CultureTypes.AllCultures)**
    **where culture.NumberFormat.NumberDecimalSeparator == ","**
    **select culture;**

foreach (CultureInfo culture in commaCultures)
{
    Console.WriteLine(culture.Name);
}

The foreach loop in this example shows the results of the query. The output will vary according to the language support installed on the system you run it on. On my system, this lists the names of 366 cultures, indicating that slightly under half of the 869 available cultures use a comma, not a decimal point. Of course, I could easily have achieved this without using LINQ. Example 10-2 will produce the same results.

Example 10-2. The non-LINQ equivalent

CultureInfo[] allCultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
foreach (CultureInfo culture in allCultures)
{
    if (culture.NumberFormat.NumberDecimalSeparator == ",")
    {
        Console.WriteLine(culture.Name);
    }
}

Both examples have eight nonblank lines of code, although if you ignore lines that contain only braces, Example 10-2 contains just four, two fewer than Example 10-1. Then again, if we count statements, the LINQ example has just three, compared to four in the loop-based example. So it’s difficult to argue convincingly that either approach is simpler than the other.

However, Example 10-1 has a significant advantage: the code that decides which items to choose is well separated from the code that decides what to do with those items. Example 10-2 intermingles these two concerns: the code that picks the objects is half outside and half inside the loop.

Another difference is that Example 10-1 has a more declarative style: it focuses on what we want, not how to get it. The query expression describes the items we’d like, without mandating that this be achieved in any particular way. For this very simple example, that doesn’t matter much, but for more complex examples, and particularly when using a LINQ provider for database access, it can be very useful to allow the provider a free hand in deciding exactly how to perform the query. Example 10-2’s approach of iterating over everything in a foreach loop and picking the item it wants would be a bad idea if we were talking to a database—you generally want to let the server do this sort of filtering work.

The query in Example 10-1 has three parts. All query expressions are required to begin with a from clause, which specifies the source of the query. In this case, the source is an array of type CultureInfo[], returned by the CultureInfo class’s GetCultures method. As well as defining the source for the query, the from clause contains a name, culture. This is called the range variable, and we can use it in the rest of the query to represent a single item from the source. Clauses can run many times—the where clause in Example 10-1 runs once for every item in the collection, so the range variable will have a different value each time. This is reminiscent of the iteration variable in a foreach loop. In fact, the overall structure of the from clause is similar—we have the variable that will represent an item from a collection, then the in keyword, then the source for which that variable will represent individual items. Just as a foreach loop’s iteration variable is in scope only inside the loop, the range variable culture is meaningful only inside this query expression.

Note

Although analogies with foreach can be helpful for understanding the intent of LINQ queries, you shouldn’t take this too literally. For example, not all providers directly execute the expressions in a query. Some LINQ providers convert query expressions into database queries, in which case the C# code in the various expressions inside the query does not run in any conventional sense. So, although it is true to say that the range variable represents a single value from the source, it’s not always true to say that clauses will execute once for every item they process, with the range value taking that item’s value. It happens to be true for Example 10-1 because it uses LINQ to Objects, but it’s not so for all providers.

The second part of the query in Example 10-1 is a where clause. This clause is optional, or if you want, you can have several in one query. A where clause filters the results, and the one in this example states that I want only the CultureInfo objects with a NumberFormat that indicates that the decimal separator is a comma.

The final part of the query is a select clause. All query expressions end with either a select clause or a group clause. This determines the final output of the query. This example indicates that we want each CultureInfo object that was not filtered out by the query. The foreach loop in Example 10-1 that shows the results of the query uses only the Name property, so I could have written a query that extracted only that. As Example 10-3 shows, if I do this, I also need to change the loop, because the resulting query now produces strings instead of CultureInfo objects.

Example 10-3. Extracting just one property in a query

IEnumerable<string> commaCultures =
    from culture in CultureInfo.GetCultures(CultureTypes.AllCultures)
    where culture.NumberFormat.NumberDecimalSeparator == ","
    **select culture.Name;**

foreach (string cultureName in commaCultures)
{
    Console.WriteLine(cultureName);
}

This raises a question: In general, what type do query expressions have? In Example 10-1, commaCultures is an IEnumerable; in Example 10-3, it’s an IEnumerable. The output item type is determined by the final clause of the query—the select or, in some cases, the group clause. However, not all query expressions result in an IEnumerable. It depends on which LINQ provider you use—I’ve ended up with IEnumerable because I’m using LINQ to Objects.

Note

It’s common to use the var keyword when declaring variables that hold LINQ queries. This is necessary if a select clause produces instances of an anonymous type, because there is no way to write the name of the resulting query’s type. Even if anonymous types are not involved, var is still widely used, and there are two reasons. One is just a matter of consistency: some people feel that because you have to use var for some LINQ queries, you should use it for all of them. Another argument is that LINQ query types often have verbose and ugly names, and var results in less cluttered code. In this chapter I have used var where necessary.

How did C# know that I wanted to use LINQ to Objects? It’s because I used an array as the source in the from clause. More generally, LINQ to Objects will be used when you specify any IEnumerable as the source, unless a more specialized provider is available. However, this doesn’t really explain how C# discovers the existence of providers in the first place and how it chooses between them. To understand that, you need to know what the compiler does with a query expression.

How Query Expressions Expand

The compiler converts all query expressions into one or more method calls. Once it has done that, the LINQ provider is selected through exactly the same mechanisms that C# uses for any other method call. The compiler does not have any built-in concept of what constitutes a LINQ provider. It just relies on convention. Example 10-4 shows what the compiler does with the query expression in Example 10-3.

Example 10-4. The effect of a query expression

IEnumerable<string> commaCultures =
    CultureInfo.GetCultures(CultureTypes.AllCultures)
    .Where(culture => culture.NumberFormat.NumberDecimalSeparator == ",")
    .Select(culture => culture.Name);

The Where and Select methods are examples of LINQ operators. A LINQ operator is nothing more than a method that conforms to one of the standard patterns. I’ll describe these patterns later, in “Standard LINQ Operators”.

The code in Example 10-4 is all one statement, and I’m chaining method calls together—I call the Where method on the return value of GetCultures, and I call the Select method on the return value of Where. The formatting looks a little peculiar, but it’s too long to go on one line; and, even though it’s not terribly elegant, I prefer to put the . at the start of the line when splitting chained calls across multiple lines, because it makes it much easier to see that each new line continues from where the last one left off. Leaving the period at the end of the preceding line looks neater but also makes it much easier to misread the code.

The compiler has turned the where and select clauses’ expressions into lambdas. Notice that the range variable ends up as a parameter in each lambda. This is one example of why you should not take the analogy between query expressions and foreach loops too literally. Unlike a foreach iteration variable, the range variable does not exist as a single conventional variable. In the query, it is just an identifier that represents an item from the source, and in expanding the query into method calls, C# may end up creating multiple real variables for a single range variable, like it has with the arguments for the two separate lambdas here.

All query expressions boil down to this sort of thing—chained method calls with lambdas. (This is why we don’t strictly need the query expression syntax—you could write any query using method calls instead.) Some are more complex than others. The expression in Example 10-1 ends up with a simpler structure despite looking almost identical to Example 10-3. Example 10-5 shows how it expands. It turns out that when a query’s select clause just passes the range variable straight through, the compiler interprets that as meaning that we want to pass the results of the preceding clause straight through without further processing, so it doesn’t add a call to Select. (There is one exception to this: if you write a query expression that contains nothing but a from and a select clause, it will generate a call to Select even if the select clause is trivial.)

Example 10-5. How trivial `select` clauses expand

IEnumerable<CultureInfo> commaCultures =
    CultureInfo.GetCultures(CultureTypes.AllCultures)
    .Where(culture => culture.NumberFormat.NumberDecimalSeparator == ",");

The compiler has to work harder if you introduce multiple variables within the query’s scope. You can do this with a let clause. Example 10-6 performs the same job as Example 10-3, but I’ve introduced a new variable called numFormat to refer to the number format. This makes my where clause shorter and easier to read, and in a more complex query that needed to refer to that format object multiple times, this technique could remove a lot of clutter.

Example 10-6. Query with a `let` clause

IEnumerable<string> commaCultures =
    from culture in CultureInfo.GetCultures(CultureTypes.AllCultures)
    **let numFormat = culture.NumberFormat**
    where numFormat.NumberDecimalSeparator == ","
    select culture.Name;

When you write a query that introduces additional variables like this, the compiler automatically generates an anonymous type with a property for each of the variables so that it can make them all available at every stage. To get the same effect with ordinary method calls, we’d need to do something similar, as Example 10-7 shows.

Example 10-7. How multivariable query expressions expand (approximately)

IEnumerable<string> commaCultures =
    CultureInfo.GetCultures(CultureTypes.AllCultures)
    .Select(culture => new { culture, numFormat = culture.NumberFormat })
    .Where(vars => vars.numFormat.NumberDecimalSeparator == ",")
    .Select(vars => vars.culture.Name);

No matter how simple or complex they are, query expressions are nothing more than a specialized syntax for method calls.

Deferred Evaluation

LINQ to Objects has been designed to work well with sequences like the one returned by the Fibonacci method in Example 10-8. That returns a never-ending sequence—it will keep providing numbers from the Fibonacci series for as long as the code keeps asking for them. I have used the IEn⁠ume⁠rab⁠le<Big⁠Inte⁠ger> returned by this method as the source for a query expression.

Example 10-8. Query with an infinite source sequence

using System.Numerics;

static IEnumerable<BigInteger> Fibonacci()
{
    BigInteger n1 = 1;
    BigInteger n2 = 1;
    yield return n1;
    while (true)
    {
        yield return n2;
        BigInteger t = n1 + n2;
        n1 = n2;
        n2 = t;
    }
}

IEnumerable<BigInteger> evenFib = from n in Fibonacci()
                                  where n % 2 == 0
                                  select n;

foreach (BigInteger n in evenFib)
{
    Console.WriteLine(n);
}

This will use the Where extension method that LINQ to Objects provides for IEnumerable. You could imagine an implementation of Where that iterates through its source collection, putting items that match the criteria into a List that it returns once it has worked through the whole input. But if it worked that way, this program would never make it as far as displaying a single number. Where can never work its way through the whole input here because my Fibonacci enumerator is infinite.

In fact, Example 10-8 works perfectly—it produces a steady stream of output consisting of the Fibonacci numbers that are divisible by 2. This means it can’t be attempting to perform all of the filtering when we call Where. Instead, its Where method returns an IEnumerable that filters items on demand. It won’t try to fetch anything from the input sequence until something asks for a value, at which point it will start retrieving one value after another from the source until the filter delegate says that a match has been found. It then produces that and doesn’t try to retrieve anything more from the source until it is asked for the next item. Example 10-9 shows how you could implement this behavior by taking advantage of C#’s yield return feature.

Example 10-9. A custom deferred `Where` operator

public static class CustomDeferredLinqProvider
{
    public static IEnumerable<T> Where<T>(this IEnumerable<T> src,
                                          Func<T, bool> filter)
    {
        foreach (T item in src)
        {
            if (filter(item))
            {
                yield return item;
            }
        }
    }
}

The real LINQ to Objects implementation of Where is somewhat more complex. It detects certain special cases, such as arrays and lists, and it handles them in a way that is slightly more efficient than the general-purpose implementation that it falls back to for other types. However, the principle is the same for Where and all of the other operators: these methods do not perform the specified work. Instead, they return objects that will perform the work on demand. It’s only when you attempt to retrieve the results of a query that anything really happens. This is called deferred evaluation, or sometimes lazy evaluation.

Deferred evaluation has the benefit of not doing work until you need it, and it makes it possible to work with infinite sequences. However, it also has disadvantages. You may need to be careful to avoid evaluating queries multiple times. Example 10-10 makes this mistake, causing it to do much more work than necessary. This loops through several different numbers and writes out each one using the currency format of each culture that uses a comma as a decimal separator.

Note

If you run this on Windows, you may find that most of the lines this code displays will contain ? characters, indicating that the console cannot display most of the currency symbols. In fact, it can—it just needs permission. By default, the Windows console uses an 8-bit code page for backward-compatibility reasons. If you run the command chcp 65001 from a Command Prompt, it will switch that console window into a UTF-8 code page, enabling it to show any Unicode characters supported by your chosen console font. You might want to configure the console to use a font with comprehensive support for uncommon characters—Consolas or Lucida Console, for example—to take best advantage of that.

Example 10-10. Accidental reevaluation of a deferred query

IEnumerable<CultureInfo> commaCultures =
    from culture in CultureInfo.GetCultures(CultureTypes.AllCultures)
    where culture.NumberFormat.NumberDecimalSeparator == ","
    select culture;

object[] numbers = [1, 100, 100.2, 10000.2];

foreach (object number in numbers)
{
    foreach (CultureInfo culture in commaCultures)
    {
        Console.WriteLine(string.Format(culture, "{0}: {1:c}",
                          culture.Name, number));
    }
}

The problem with this code is that even though the commaCultures variable is initialized outside of the number loop, we iterate through it for each number. And because LINQ to Objects uses deferred evaluation, that means that the actual work of running the query is redone every time around the outer loop. So, instead of evaluating that where clause once for each culture (869 times on my system), it ends up running four times for each culture (3,476 times) because the whole query is evaluated once for each of the four items in the numbers array. It’s not a disaster—the code still works correctly. But if you do this in a program that runs on a heavily loaded server, it will harm your throughput.

If you know you will need to iterate through the results of a query multiple times, consider using either the ToList or ToArray extension methods provided by LINQ to Objects. These immediately evaluate the whole query once, producing an IList or a T[] array, respectively (so you shouldn’t use these methods on infinite sequences, obviously). You can then iterate through that as many times as you like without incurring any further costs (beyond the minimal cost inherent in reading array or list elements). But in cases where you iterate through a query only once, it is usually better not to use these methods, as they’ll consume more memory than necessary.

LINQ, Generics, and IQueryable

LINQ providers use generic types. LINQ to Objects uses IEnumerable. Several of the database providers use a type called IQueryable. More broadly, the pattern is to have some generic type Source, where Source represents some source of items, and T is the type of an individual item. A source type with LINQ support makes operator methods available on Source for any T, and those operators also typically return Source, where TResult may or may not be different than T.

IQueryable is interesting because it is designed to be used by multiple providers. This interface, its base IQueryable, and the related IQueryProvider are shown in Example 10-11.

Example 10-11. `IQueryable` and `IQueryable<T>`

public interface IQueryable : IEnumerable
{
    Type ElementType { get; }
    Expression Expression { get; }
    IQueryProvider Provider { get; }
}

public interface IQueryable<out T> : IEnumerable<T>, IQueryable
{
}

public interface IQueryProvider
{
    IQueryable CreateQuery(Expression expression);
    IQueryable<TElement> CreateQuery<TElement>(Expression expression);
    object? Execute(Expression expression);
    TResult Execute<TResult>(Expression expression);
}

The most obvious feature of IQueryable is that it adds no members to its bases. That’s because it’s designed to be used entirely via extension methods. The Sys⁠tem.Li⁠nq namespace defines all of the standard LINQ operators for IQueryable as extension methods provided by the Queryable class. However, all of these simply defer to the Provider property defined by the IQueryable base. So, unlike LINQ to Objects, where the extension methods on IEnumerable define the behavior, an IQueryable implementation is able to decide how to handle queries because it gets to supply the IQueryProvider that does the real work.

However, all IQueryable-based LINQ providers have one thing in common: they interpret the lambdas as expression objects, not delegates. Example 10-12 shows the declaration of the Where extension methods defined for IEnumerable and IQu⁠eryab⁠le. Compare the predicate parameters.

Example 10-12. `Enumerable` versus `Queryable`

public static class Enumerable
{
    public static IEnumerable<TSource> Where<TSource>(
        this IEnumerable<TSource> source,
        **Func<TSource, bool> predicate)**
    ...
}

public static class Queryable
{
    public static IQueryable<TSource> Where<TSource>(
        this IQueryable<TSource> source,
        **Expression<Func<TSource, bool>> predicate)**
    ...
}

The Where extension for IEnumerable (LINQ to Objects) takes a Func<TSource, bool>, and as you saw in Chapter 9, this is a delegate type. But the Where extension method for IQueryable (used by numerous LINQ providers) takes Exp⁠res⁠sion<Fu⁠nc<T⁠Sou⁠rce,⁠ bool>>, and as you also saw in Chapter 9, this causes the compiler to build an object model of the expression and pass that as the argument.

A LINQ provider typically uses IQueryable if it wants these expression trees. And that’s usually because it’s going to inspect your query and convert it into something else, such as a SQL query.

There are some other common generic types that crop up in LINQ. Some LINQ features guarantee to produce items in a certain order, and some do not. More subtly, a handful of operators produce items in an order that depends upon the order of their input. This can be reflected in the types for which the operators are defined and the types they return. LINQ to Objects defines IOrderedEnumerable to represent ordered data, and there’s a corresponding IOrderedQueryable type for IQueryable-based providers. (Providers that use their own types tend to do something similar—Parallel LINQ, described in Chapter 16, defines an Ord⁠eredPar⁠all⁠elQ⁠uery, for example.) These derive from their unordered counterparts, such as IEnumerable and IQueryable, so all the usual operators are available, but they make it possible to define operators or other methods that need to take the existing order of their input into account. For example, in “Ordering”, I will show a LINQ operator called ThenBy, which is available only on sources that are already ordered.

When looking at LINQ to Objects, this ordered/unordered distinction may seem unnecessary, because IEnumerable always produces items in some sort of order. But some providers do not necessarily do things in any particular order, perhaps because they parallelize query execution, or because they get a database to execute the query for them, and databases reserve the right to meddle with the order in certain cases if it enables them to work more efficiently.

Standard LINQ Operators

In this section, I will describe the standard operators that LINQ providers can supply. Where applicable, I will also describe the query expression equivalent, although many operators do not have a corresponding query expression form. Some LINQ features are available only through explicit method invocation. This is even true with certain operators that can be used in query expressions, because most operators are overloaded, and query expressions can’t use some of the more advanced overloads.

Note

LINQ operators are not operators in the usual C# sense—they are not symbols such as + or &&. LINQ has its own terminology, and for this chapter, an operator is a query capability offered by a LINQ provider. In C#, it looks like a method.

All of these operators have something in common: they have all been designed to support composition. This means that you can combine them in almost any way you like, making it possible to build complex queries out of simple elements. To enable this, operators not only take some type representing a set of items (e.g., an IEnumerable) as their input, but most of them also return something representing a set of items. As already mentioned, the item type is not always the same—an operator might take some IEnumerable as input, and produce IEnumerable as output, where TResult does not have to be the same as T. Even so, you can still chain the things together in any number of ways. Part of the reason this works is that LINQ operators are like mathematical functions in that they do not modify their inputs; rather, they produce a new result that is based on their operands. (Functional programming languages typically have the same characteristic.) This means that not only are you free to plug operators together in arbitrary combinations without fear of side effects, but you are also free to use the same source as the input to multiple queries, because no LINQ query will ever modify its input. Each operator returns a new query based on its input.

Nothing enforces this functional style. The compiler doesn’t care what a method representing a LINQ operator does. It is only by convention that operators are functional, in order to support composition, but the built-in LINQ providers all work this way.

Not all providers offer complete support for all operators. The main providers Microsoft supplies—such as LINQ to Objects or the LINQ support in EF Core and Rx—are as comprehensive as they can be, but there are some situations in which certain operators will not make sense.

To demonstrate the operators in action, I need some source data. Many of the examples in the following sections will use the code in Example 10-13.

Example 10-13. Sample input data for LINQ queries

public record Course(
    string Title,
    string Category,
    int Number,
    DateOnly PublicationDate,
    TimeSpan Duration)
{
    public static readonly Course[] Catalog =
    [
        new Course(
            Title: "Elements of Geometry",
            Category: "MAT", Number: 101, Duration: TimeSpan.FromHours(3),
            PublicationDate: new DateOnly(2009, 5, 20)),
        new Course(
            Title: "Squaring the Circle",
            Category: "MAT", Number: 102, Duration: TimeSpan.FromHours(7),
            PublicationDate: new DateOnly(2009, 4, 1)),
        new Course(
            Title: "Recreational Organ Transplantation",
            Category: "BIO", Number: 305, Duration: TimeSpan.FromHours(4),
            PublicationDate: new DateOnly(2002, 7, 19)),
        new Course(
            Title: "Hyperbolic Geometry",
            Category: "MAT", Number: 207, Duration: TimeSpan.FromHours(5),
            PublicationDate: new DateOnly(2007, 10, 5)),
        new Course(
            Title: "Oversimplified Data Structures for Demos",
            Category: "CSE", Number: 104, Duration: TimeSpan.FromHours(2),
            PublicationDate: new DateOnly(2023, 11, 14)),
        new Course(
            Title: "Introduction to Human Anatomy and Physiology",
            Category: "BIO", Number: 201, Duration: TimeSpan.FromHours(12),
            PublicationDate: new DateOnly(2001, 4, 11)),
    ];
}

Filtering

One of the simplest operators is Where, which filters its input. You provide a predicate, which is a function that takes an individual item and returns a bool. Where returns an object representing the items from the input for which the predicate is true. (Conceptually, this is very similar to the FindAll method available on List and array types, but using deferred execution.)

As you’ve already seen, query expressions represent this with a where clause. However, there’s an overload of the Where operator that provides an additional feature not accessible from a query expression. You can write a filter lambda that takes two arguments: an item from the input and an index representing that item’s position in the source. Example 10-14 uses this form to exclude every second item from the input, and it also drops courses shorter than three hours.

Example 10-14. `Where` operator with index

IEnumerable<Course> q = Course.Catalog.Where(
    (course, index) => (index % 2 == 0) && course.Duration.TotalHours >= 3);

Indexed filtering is meaningful only for ordered data. It always works with LINQ to Objects, because that uses IEnumerable, which produces items one after another, but not all LINQ providers process items in sequence. For example, with EF Core, the LINQ queries you write in C# will be handled on the database. Unless a query explicitly requests some particular order, a database is usually free to process items in whatever order it sees fit, possibly in parallel. In some cases, a database may have optimization strategies that enable it to produce the results a query requires using a process that bears little resemblance to the original query. So it might not even be meaningful to talk about, say, the 14th item handled by a WHERE clause. Consequently, if you were to write a query similar to Example 10-14 using EF Core, executing the query would cause an exception, complaining that the indexed Where operator is not available. If you’re wondering why the overload is even present if the provider doesn’t support it, it’s because EF Core uses IQueryable, so all the standard operators are available at compile time; providers that choose to use IQueryable can only report the nonavailability of operators at runtime.

Note

LINQ providers that implement some or all of the query logic on the server side usually limit what you can do in a query’s lambdas. Conversely, LINQ to Objects runs queries in process, so it lets you invoke any method from inside a filter lambda—if you want to call Console.WriteLine or read data from a file in your predicate, LINQ to Objects can’t stop you. But LINQ providers for databases need to be able to translate your lambdas into something the server can process, so they will reject expressions that use methods with no server-side equivalent.

Even so, you might have expected the exception to emerge when you invoke Where, instead of when you try to execute the query (i.e., when you first try to retrieve one or more items). However, providers that convert LINQ queries into some other form, such as a SQL query, typically defer all validation until you execute the query. This is because some operators may be valid only in certain scenarios, meaning that the provider may not know whether any particular operator will work until you’ve finished building the whole query. It would be inconsistent if errors caused by nonviable queries sometimes emerged while building the query and sometimes when executing it, so even in cases where a provider could determine earlier that a particular operator will fail, it will usually wait until you execute the query to tell you.

The filter lambda you supply to the Where operator must take an argument of the item type (the T in IEnumerable, for example), and it must return a bool. You may remember from Chapter 9 that the runtime libraries define a suitable delegate type called Predicate, but I also mentioned in that chapter that LINQ avoids this, and we can now see why. The indexed version of the Where operator cannot use Predicate, because there’s an additional argument, so that overload uses Func<T, int, bool>. There’s nothing stopping the unindexed form of Where from using Predicate, but LINQ providers tend to use Func across the board to ensure that operators with similar meanings have similar-looking signatures. Most providers therefore use Func<T, bool> instead, to be consistent with the indexed version. (C# doesn’t care which you use—query expressions still work if the provider uses Predicate, but none of Microsoft’s providers do this.)

Warning

The C# compiler’s nullability analysis doesn’t understand what LINQ operators do. Given an IEnumerable<string?>, writing xs.Where(s => s is not null) removes any null items, but Where will still return an IEnumerable<string?>. The compiler has no expectations around what Where will do, so it doesn’t understand that the output is effectively an IEnumerable.

LINQ defines another filtering operator: OfType. This is useful if your source contains a mixture of different item types—perhaps the source is an IEnumerable

Chapter 10. LINQ

Query Expressions

Example 10-1. A LINQ query expression

Example 10-2. The non-LINQ equivalent

Note

Example 10-3. Extracting just one property in a query

Note

How Query Expressions Expand

Example 10-4. The effect of a query expression

Example 10-5. How trivial select clauses expand

Example 10-6. Query with a let clause

Example 10-7. How multivariable query expressions expand (approximately)

Deferred Evaluation

Example 10-8. Query with an infinite source sequence

Example 10-9. A custom deferred Where operator

Note

Example 10-10. Accidental reevaluation of a deferred query

LINQ, Generics, and IQueryable

Example 10-11. IQueryable and IQueryable<T>

Example 10-12. Enumerable versus Queryable

Standard LINQ Operators

Note

Example 10-13. Sample input data for LINQ queries

Filtering

Example 10-14. Where operator with index

Note

Warning

Example 10-15. The OfType<T> operator

Example 10-16. Removing duplicates with Distinct

Select

Example 10-17. Select operator with index

Example 10-18. Indexed Select downstream of Where operator

Example 10-19. Indexed Select upstream of Where operator

Data shaping and anonymous types

Example 10-20. Fetching more data than is needed

Example 10-21. A select clause with an anonymous type

Projection and mapping

Example 10-22. Using Select to transform numbers

SelectMany

Example 10-23. Using SelectMany from a query expression

Example 10-24. SelectMany operator

Example 10-25. Flattening a jagged array

Example 10-26. SelectMany without item projection

Example 10-27. One implementation of SelectMany

Ordering

Example 10-28. Query expression with orderby clause

Warning

Example 10-29. How not to apply multiple ordering criteria

Example 10-30. Multiple ordering criteria in a query expression

Example 10-31. Multiple ordering criteria with LINQ operators

Containment Tests

Note

Asynchronous Immediate Evaluation

Specific Items and Subranges

Example 10-32. Applying the Single operator to a query

Example 10-33. The Single operator with predicate

Example 10-34. Using First to select the longest course

Tip

Example 10-35. SingleOrDefault with explicit default value

Example 10-36. How not to use ElementAt

Whole-Sequence, Order-Preserving Operations

Aggregation

Example 10-37. Average operator with projection

Example 10-38. Max with projection

Example 10-39. MaxBy with projection for criteria but not for result

Example 10-40. MaxBy with projection for criteria but not for result, with error on empty input

Example 10-41. Sum and equivalent with Aggregate

Example 10-42. Implementing Max with Aggregate

Example 10-43. Implementing Average with Aggregate

Note

Example 10-44. Aggregating bounding boxes

Example 10-45. More verbose and less obscure bounding box aggregation

Example 10-46. The effect of Aggregate

Grouping

Example 10-47. Grouping query expression

Figure 10-1. Result of evaluating a grouping query

Example 10-48. Expanding a simple grouping query

Example 10-49. Group query with item projection

Example 10-50. Expanding a group query with an item projection

Example 10-51. Group query with group projection

Example 10-5. How trivial `select` clauses expand

Example 10-6. Query with a `let` clause

Example 10-9. A custom deferred `Where` operator

Example 10-11. `IQueryable` and `IQueryable<T>`

Example 10-12. `Enumerable` versus `Queryable`

Example 10-14. `Where` operator with index

Example 10-15. The `OfType<T>` operator

Example 10-16. Removing duplicates with `Distinct`

Example 10-17. `Select` operator with index

Example 10-18. Indexed `Select` downstream of `Where` operator

Example 10-19. Indexed `Select` upstream of `Where` operator

Example 10-21. A `select` clause with an anonymous type

Example 10-22. Using `Select` to transform numbers

Example 10-23. Using `SelectMany` from a query expression

Example 10-24. `SelectMany` operator

Example 10-26. `SelectMany` without item projection

Example 10-27. One implementation of `SelectMany`

Example 10-28. Query expression with `orderby` clause

Example 10-32. Applying the `Single` operator to a query

Example 10-33. The `Single` operator with predicate

Example 10-34. Using `First` to select the longest course

Example 10-35. `SingleOrDefault` with explicit default value

Example 10-36. How not to use `ElementAt`

Example 10-37. `Average` operator with projection

Example 10-38. `Max` with projection

Example 10-39. `MaxBy` with projection for criteria but not for result

Example 10-40. `MaxBy` with projection for criteria but not for result, with error on empty input

Example 10-41. `Sum` and equivalent with `Aggregate`

Example 10-42. Implementing `Max` with `Aggregate`

Example 10-43. Implementing `Average` with `Aggregate`

Example 10-46. The effect of `Aggregate`

Example 10-53. `GroupBy` with key and group projections

Example 10-54. `GroupBy` operator with key, item, and group projections