A.I, Data and Software Engineering

Advanced python: collections


In part 2, we introduced advanced python knowledge with built-in functions and other useful tools for sequence iteration, data transformation. Advanced python part 3 will continue with collections to manipulate our data.

Named tuple

Suppose we want to define a data structure to represent a geometric point on a typical x and y-axis. We could easily do this by defining a regular tuple with two elements, the x and y values of the point.

#create a variable to store x,y
p = (10, 20)

This may seem all fine and good, but as the program becomes more complex this kind of code easily loses its meaning and becomes hard to read.  Especially, if we don’t keep the names of all the point variables clear and meaningful. 

Namedtuples help to solve this problem by assigning meaning to each of the values along with the tuple itself. And they also provide some helpful functions for working with them. 

# create a Point namedtuple
Point = collections.namedtuple("Point", "x y")
#or "Point", ['x', 'y']

The operation over the namedtuple now become so much easier to read.

p1 = Point(10, 20)
p2 = Point(30, 40)
print(p1.x, p1.y)
# use _replace to create a new instance
p1 = p1._replace(x=100)

Default dict

Default dict can let you create a key that does not exist with some default value so that you don’t have to handle the key manually. The following code demonstrates a counter that will count the fruit of each type.

from collections import defaltdict
# define a list of items that we want to count
fruits = ['apple', 'pear', 'orange', 'banana',
              'apple', 'grape', 'banana', 'banana']
# use a dictionary to count each element
fruitCounter = defaultdict(int)
# Count the elements in the list
for fruit in fruits:
    fruitCounter[fruit] += 1


Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts and convenient for manipulating sequence with some basic statistics.

# list of students in class 1
class1 = ["Bob", "James", "Chad", "Darcy", "Penny", "Hannah"
          "Kevin", "James", "Melanie", "Becky", "Steve", "Frank"]
# list of students in class 2
class2 = ["Bill", "Barry", "Cindy", "Debbie", "Frank",
          "Gabby", "Kelly", "James", "Joe", "Sam", "Tara", "Ziggy"]
# Create a Counter for class1 and class2
c1 = Counter(class1)
c2 = Counter(class2)
# How many students in class 1 named James?
# How many students are in class 1?
print(sum(c1.values()), "students in class 1")
# Combine the two classes
print(sum(c1.values()), "students in class 1 and 2")
# What's the most common name in the two classes?
# Separate the classes again
# What's common between the two classes?
print(c1 & c2)

Deque (Double Ended Queue)

Deque (pronounced as /dek/) is used to append or pop data from either side, and they are designed to be memory-efficient when accessing them from either end (left and right). Deques can be initialized to be either empty or get their initial data from an existing, iterable object, and they can also be specified to have a maximum length.

import collections
import string
# initialize a deque with lowercase letters
d = collections.deque(string.ascii_lowercase)
# deques support the len() function
print("Item count: " + str(len(d)))
# deques can be iterated over
for elem in d:
    print(elem.upper(), end=",")
# manipulate items from either end
# rotate the deque

You can read more advanced python with these articles.

Add comment


A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.