Collections
In all object-oriented (and functional) languages, there are many data structures designed to assist programmers for various applications. A significant group of useful data structures is known as collections. A collection is a generic class that organizes multiple objects in a specific manner and provides operations for access and modification. The usefulness of collections lies in their specialization, which determines how much memory and time are consumed for different operations.
The most commonly used collections are lists, sets, and dictionaries/maps:
- List<T> - the most commonly used type of list in C#, with the underlying implementation being an array list. It has good performance for indexed access in O(1), but insertions and deletions are in the worst case O(n).
- LinkedList<T> - implements a doubly linked list with slightly poorer performance compared to List<T>. Indexed access is O(n), but insertion and deletion at the list ends are O(1).
- HashSet<T> - an unordered set with high performance that does not allow duplicate elements. It uses the GetHashCode() and Equals() methods of type T to determine where an element should be placed in its internal structure, which is a hash table. This means T needs to override these methods. It has near O(1) average performance for all operations on average types but can become O(n) for a large number of elements.
- SortedSet<T> - an ordered set with constant good performance as it maintains element order and doesn't allow duplicates. Elements of type T must implement IComparable<T> or be given an IComparator<T> in the collection's constructor to establish the element order. It internally maintains a red-black tree. Access and modification operations on this set are always O(log n).
- Dictionary<TKey, TValue> - an unordered map/table with keys of type TKey to access values of type TValue. It internally maintains a hash table like HashSet and behaves similarly, relying on TKey's GetHashCode() and Equals().
- SortedDictionary<TKey, TValue> - an ordered map/table with keys of type TKey to access values of type TValue. It internally maintains a red-black tree like SortedSet and behaves similarly, relying on an IComparator<TKey> or TKey implementing IComparable<TKey>.
- SortedList<TKey, TValue> - works similarly to SortedDictionary<TKey, TValue> but with an internal sorted list. Due to this, although access is O(log n), insertions and deletions are O(n), making it faster only for indexed value access.
Other useful collections include:
- Stack<T> - implements a Last-In-First-Out (LIFO) queue.
- Queue<T> - implements a First-In-First-Out (FIFO) queue.
- PriorityQueue<TElement, TPriority> - implements a priority queue for elements of type TElement, ordered by values of type TPriority, given an IComparator<TPriority> or having TPriority implement IComparable<TPriority>.
As you can see, you need to be cautious about which data structure you choose for your programs because it will impact performance and the logic of your program's use of one collection over another. Most likely, you'll use List<T> and Dictionary<TKey, TValue> classes in most cases, but it's good to know about the other data structures when you need them.
Resources
📄️ Enumerations
In addition to collections, C# also has the concept of enumerations (enumerables) which encapsulate the concept of an iterable data structure, meaning an enumeration can be iterated through. All collections implement the IEnumerable<T> interface and are enumerations. Enumerations expose an enumerator, which in C# is equivalent to an iterator for traversing the collection. An enumerator/iterator is an object that is associated with an instance of a collection and keeps track of its current position in that collection in order to implement traversal directives like foreach.
📄️ LINQ
Before diving into the details of LINQ, let's provide a brief introduction to functional types. In most object-oriented programming languages, functional types are supported. You can pass functions as parameters to other functions or return functions. In C#, you can use Func<TIn1, TIn2, ..., TIn14, TResult> (which can have 0 to 14 input types) to represent a function that takes parameters of types TIn1, TIn2, ..., TIn14 and returns a TResult, and Action<TIn1, TIn2, ..., TIn14>, which is the same thing except it doesn't return anything. Both types are delegates, essentially the C# equivalent of function pointers. Additionally, you can declare inline functions, or what are actually known as lambda functions, like so: