A collection can be thought of as a group of things. For example, you could have a collection of toys, books, stuff in your room, etc. Collections don’t have a particular order and the types of things they contain can vary. For example, if you reorder a collection of books, it’s still a collection of books. If you ask for book number three, it doesn’t really mean anything.
This lack of structure reduces the usefulness of collections in the context of programming. To get around this, data structures have been developed that take collections and logical rules to them. This post will explore some of the most common list-based collections.
Arrays are just collections that have an order to them. Depending on the programming language, arrays can hold elements of different types or they can be uniform. In some languages arrays have a set size and in some they don’t. Across all languages, each element in an array will have a location called an index. Think of this as a number assignment starting from 0. For example, index 1 of the array [10, ‘purple’, 9, 30] would be ‘purple’.
Pros: Easy to access a location. For example, if you need to access the location in the middle of the array you can just keep track of how long the array is calculate the middle index from that.
Cons: Insertion and deletion can be costly. If you add something to the middle of an array, you have to shift everything that comes after it to a new index location. In the worst case, this time on the order of O(n). Deletion works the same way.
In Python, there is a built-in data structure called a ‘list’ that is built as an array. The time complexity of various list based methods can be found at https://wiki.python.org/moin/TimeComplexity. An example of lists in Python:
# Initialize list my_list = ['red', 'green', 'blue'] # get value at index 0 print my_list # returns 'red' # add 'orange' to the end of the list my_list.append('orange') print my_list # returns ['red', 'green', 'blue', 'orange']
Unlike an array, a linked list has no indices. Instead, elements of a linked list keep track of what element comes after them (the link). The elements may not know where they are in relation to the larger list, but they know where they are in their neighborhood. For example, an element in an Array will store a ‘value’ and an ‘index’, while an element in a linked list will store a ‘value’ and a ‘next value’.
Pros: Easy to insert and delete. Adding an element is as easy as changing the ‘next value’ that comes before it. Happens in constant time O(1).
Cons: Hard to select a single location, like the middle.
An example of a Linked List class coded in Python:
class Element(object): def __init__(self, value): self.value = value self.next = None class LinkedList(object): def __init__(self, head=None): self.head = head def append(self, new_element): current = self.head if self.head: while current.next: current = current.next current.next = new_element else: self.head = new_element def get_position(self, position): """ Get an element from a particular position. Assume the first position is "1". Return "None" if position is not in the list. """ counter = 1 current = self.head if position < 1: return None while current and counter <= position: if counter == position: return current current = current.next counter += 1 return None def insert(self, new_element, position): """ Insert a new node at the given position. Assume the first position is "1". Inserting at position 3 means between the 2nd and 3rd elements. """ counter = 1 current = self.head if position > 1: while current and counter < position: if counter == position - 1: new_element.next = current.next current.next = new_element current = current.next counter += 1 elif position == 1: new_element.next = self.head self.head = new_element def delete(self, value): """ Delete the first node with a given value. """ current = self.head previous = None while current.value != value and current.next: previous = current current = current.next if current.value == value: if previous: previous.next = current.next else: self.head = current.next
A stack can be thought of like a stack of books in real life, you have easy access to add or remove a book on top of the stack, but if you want to see the book on the bottom you have remove all the books on top. Adding an element to a stack is called a push and removing an element is called a pop. Stacks function in a “last in, first out” sense. You will always pop the last element that was pushed.
Pros: Useful when you only care about the most recent element or the order in which you saved elements. Pushing and popping from the top of the stack happens in constant time O(1).
Cons: Unable to easily access elements in the middle of the stack.
In Python, the list data type can be used as a stack by using append() and pop(). An example of a stack class built from scratch:
class Element(object): def __init__(self, value): self.value = value self.next = None class LinkedList(object): def __init__(self, head=None): self.head = head def append(self, new_element): current = self.head if self.head: while current.next: current = current.next current.next = new_element else: self.head = new_element def insert_first(self, new_element): """ Insert new element as the head of the LinkedList """ new_element.next = self.head self.head = new_element def delete_first(self): """ Delete the first (head) element in the LinkedList and return it """ deleted = self.head if self.head: self.head = self.head.next deleted.next = None return deleted class Stack(object): def __init__(self,top=None): self.ll = LinkedList(top) def push(self, new_element): """ Push (add) a new element onto the top of the stack """ self.ll.insert_first(new_element) def pop(self): """ Pop (remove) the first element off the top of the stack and return it """ return self.ll.delete_first()
Queues are like stacks but instead of “last in, first out” they use “first in, first out”. Think of a line of people waiting to buy tickets, the first person to get in line will be the first person called up to the ticket booth. The last person to get in line will be the last. The first element in a queue is called the head and the last element is called the tail. Adding to the tail end is called an enqueue and removing from the head end is called a dequeue.
Pros: Useful when you care about ranking elements based on how long they have been in the queue and making sure you get to every element. Operations happen in constant time O(1).
Cons: Unable to easily access elements in the middle of the Queue.
In Python, the collections.deque module is built-in as a double-ended queue. An example of a queue class built from scratch:
class Queue(object): def __init__(self, head=None): self.storage = [head] def enqueue(self, new_element): self.storage.append(new_element) def peek(self): return self.storage def dequeue(self): return self.storage.pop(0)
Your choice of data structure will always depend on your goal.
If you need to be able to access every element directly and in constant time, use an array. Think of a book. Each page can be opened independently of the others and it is easy to find only the page you are looking for.
If you need to be able to access and modify elements sequentially, use a linked list. Think of a roll of film. You have to unroll the whole thing to get the element you want, but it is easy to splice in a frame.
If you need to be able to access the most recent item added to a list, use a stack. For example, if you wanted to reverse the letters in a word, you could push each letter to a stack and then pop them out in order.
If you need to be able to access every item added to a list in order, use a queue. For example, if you want to implement a breadth-first search, you could add all the nearest possibilities to a queue and explore them each in the order added. Repeat the process until the search finds its target.