London or Chicago? It Depends on Many Refactors

rezilion December 18, 2020

23 minute read

London or Chicago? It Depends on Many Refactors

Introduction

In this post we discuss two styles of Test Driven Development (TDD): Bottom Up (Chicago style) and Top Down (London style) also known as mockist style. We show how both styles can be used application development and how refactoring may affect the tests.

Demonstration System

In the example application, we are implementing the count vectorizing service in python. The service fetches documents from the database, transforms documents into numeric features and saves the features back into db. In simple words, it counts the number of words in each document and converts the counts to vectors. Dynamic languages like python do not have the protection that compilers of static languages provide, so it is even more important to test your code in dynamic languages. Feature encoding is an important step in every machine learning pipeline, because machine learning algorithms work with vectors of numbers and not text. In our example every sentence is converted into a vector, where each index represents the frequency of the token in the sentence. Tokens in the example are simple words. The implementation is not taking into account some edge cases (for example transforming a word that was not in corpus), to keep code short.

Why TDD?

Advantages of tested code

Before we begin comparing the techniques, let’s first understand why TDD is an extremely valuable technique to use during development of serious projects:

Regression protection. When some new feature introduces unwanted behavior (bug) to the system, well written tests are supposed to alert the user about code regression. If a developer encountered a bug during development, best practice is to add unit tests to the module of interest to prevent the bug in the future.
Change management. When a developer changes some legacy code and something is broken, this code becomes theirs to deal with. If the code is not tested, developers will rethink making changes to code to make it better. Even a simple refactor that can improve code quality will be questioned. Why bother making improvements, if code can break and now it is my responsibility to fix it? A good test suite removes this crippling fear and allows developers to add features faster and improve code.
Decoupled and testable architecture. This is the most important point. If writing tests before writing production code, production code will emerge in such a way that different modules can be independently tested and deployed and so reused easily. Writing tests forces one to make the code testable, which sometimes results directly in design decisions – and usually good ones.
The test is the only enforced definition of the problem to be solved – if it doesn’t exist, then there is no enforced definition – only some vague definition in human language written in some document that will be forgotten. Thus tests are the ultimate definition and documentation of the code.

Why write tests before implementing code?

If a programmer starts with implementation before writing tests, the final code may have several disadvantages compared to writing tests first:

Implementor bias. Because a programmer already knows the implementation, he will write the tests with this knowledge in mind, which can lead to tests that are not testing the overall logic of the module, but current implementation. It’s important to stress that in the TDD work cycle, one must see the test fail properly before writing the code. When tests are written after-the-fact, many times they are just wrong.
The code and tests may be less maintainable. Because a programmer writes the code without thinking about the tests, it is much easier to mix different responsibilities in same Class and introduce complex logic. This will lead to tests that are hard to maintain.
When writing tests before implementing logic, the programmer more naturally considers the problem they need to solve, rather the solution they already might have in mind. Working on test cases before the solution can expose flaws in the solution early on.

TDD styles case study

Bottom up (Chicago)

This style dictates that writing a module must start from the deepest core of business logic and go up one class (or other logical unit) at a time. In a perfect scenario all tests only depend on entities that are already implemented. If some class has another class as collaborator, it is advisable to use the class as it is and not to mock it. The only collaborators that are mocked are collaborators that are doing IO or other out-of-process operations. Sometimes when collaborators are extremely complex, mocking it can be beneficial to reduce test setup complexity. However this should be used with caution. Complex collaborators are indicators of a bad design. A much better approach is to separate complex collaborators into a few simple collaborators and not to mock it (we see again the connection between testability and good design). That means that usually assertions that are tested are state-based or output-based. In contrast to behavior based assertions we care about what results are, and not so much about how we got them.

This method requires that the programmers do basic design work before they start writing a module, because programmers must understand what the business logic of a system is and what is an unimportant detail. This design is required, because these core business entities will be developed before other concepts of the system.

Example

Let’s implement our count vectorizer in bottom up TDD style. The tests are written one at the time. The first step is to see that the test fails. Next is to implement only enough code to make the test pass. After that, refactor the code to keep it readable and well designed. This is known as the Red, Green, Refactor cycle. I always start from a degenerate case test that is usually empty input and I gradually increase complexity of my tests.

Before I start writing my first test, I am doing basic design and trying to understand what entities are representing the core business logic of the system.

I decided to separate token parsing and token encoding into separate classes. While this decision will allow me to add different parsers and encoders without changing any code and thus complying with Open Closed Principle, it somewhat violates YAGNI heuristic.

Let’s look at Transformer and Tokenizer classes.

The purpose of this class is to tokenize raw text documents into a list of tokens. In our application tokens are simply whitespace separated words, without additional assumptions. All assumptions should be “documented” by writing tests. For example, if the requirements for tokens is not to consist of commas, there should be a test that tests this functionality.

We define “SUT” as System Under Test.

Tokenizer test:

def contains_exectly(actual, expected):
    return set(actual) == set(expected)

def test_empty_string():
    sut = Tokenizer()
    assert contains_exectly(sut.tokenize(""), [])

def test_single_word():
    sut = Tokenizer()
    assert contains_exectly(sut.tokenize("hello"), ["hello"])

def test_single_sentence():
    sut = Tokenizer()
    assert contains_exectly(sut.tokenize("hello world"), ["hello", "world"])

def test_multiline_text():
    sut = Tokenizer()
    assert contains_exectly(sut.tokenize("hello world\nNew sentence"), ["hello", "world", "New", "sentence"])

Implementation:

class Tokenizer:
    def tokenize(self, text):
        return text.strip().split()

This tokenizer is trivial and does not take into account edge cases.

The purpose of the Transformer class is to transform sequences of tokens into a feature vector. The transform function receives a list of words and a vector of known tokens, and returns a vector with counts corresponding to the known-tokens, essentially a histogram of how many times the known tokens were used in a given word list. For example, if the word “test” is located at index 5 in a list of known tokens, then in the final feature vector the index 5 represents the number of occurrences the word “test” has in the list of words.

Tests:

INDEXED_TOKENS = [f'word{i}' for i in range(5)]

def test_empty_document():
    sut = TokenTransformer()
    actual = sut.transform([], INDEXED_TOKENS)
    expected = [0, 0, 0, 0, 0]
    assert expected == actual

def test_unique_words_in_document():
    sut = TokenTransformer()
    actual = sut.transform(['word1', 'word2'], INDEXED_TOKENS)
    expected = [0, 1, 1, 0, 0]
    assert expected == actual

def test_nonunique_words_in_document():
    sut = TokenTransformer()
    actual = sut.transform(['word1', 'word2', 'word1', 'word2', 'word1'], INDEXED_TOKENS)
    expected = [0, 3, 2, 0, 0]
    assert expected == actual

Implementation:

class TokenTransformer:
    def transform(self, tokenized_document, indexed_tokens):
        res = [0 for _ in indexed_tokens]

        for token in tokenized_document:
            token_index = indexed_tokens.index(token)
            res[token_index] += 1

        return res

Currently it has a single method transform that receives a tokenized document and index of tokens created from the full corpus of documents.

The purpose of CountVectorizer is to transform many raw text documents into a list of feature vectors that represents the documents.
Now CountVectorizer depends on Tokenizer and Transformer. This allows you to change how a document is tokenized or how a feature vector is created without changing any code in CountVectorizer. Overall this is premature optimization, as YAGNI dictates, because abstractions are discovered and not pre-defined, but let’s assume that we already have few types of transformers here. I use dependency injection to allow inserting test apparatus into the system-under-test. The dependency injection is done directly via the CountVectorizer constructor.

Tests:

def create_vectorizer():
    tokenizer = Tokenizer()
    transformer = TokenTransformer()
    return CountVectorizer(transformer, tokenizer)

def test_no_documents():
    sut = create_vectorizer()
    actual = sut.vectorize([])
    expected = []
    assert actual == expected

def test_single_document():
    sut = create_vectorizer()
    actual = sut.vectorize(["this is single document single document"])
    expected = [[1, 1, 2, 2]]
    assert actual == expected

def test_many_documents_document():
    sut = create_vectorizer()
    actual = sut.vectorize(["this is single document single document",
                            "my second document",
                            "my third document"])

    expected = [[1, 1, 2, 2, 0, 0, 0],
                [0, 0, 0, 1, 1, 1, 0],
                [0, 0, 0, 1, 1, 0, 1]]

    assert actual == expected

In the test, we assert expectation only on output of the CountVectorizer, meaning, when the API of one of the collaborators is changed, this will not affect the CountVectorizer tests. Another advantage is that these unit tests are also mini integration tests, because real instances of collaborators are used.

Implementation:

class CountVectorizer:
    def __init__(self, transformer, tokenizer):
        self._transformer = transformer
        self._tokenizer = tokenizer

    def vectorize(self, text_documents):
        tokenized_documents = self._tokenize(text_documents)
        indexed_tokens = self._build_indexed_tokens(tokenized_documents)
        vectors = self._vectorize(tokenized_documents, indexed_tokens)

        return vectors

    def _tokenize(self, text_documents):
        return [self._tokenizer.tokenize(d) for d in text_documents]

    def _vectorize(self, tokenized_documents, indexed_tokens):
        return [self._transformer.transform(d, indexed_tokens) for d in tokenized_documents]

    def _build_indexed_tokens(self, tokenized_documents):
        indexed_tokens = []

        for document in tokenized_documents:
            for token in document:
                if not token in indexed_tokens:
                    indexed_tokens.append(token)

        return indexed_tokens

Note that tests are asserting only on output of `CountVectorizer` and not checking any interactions of `CountVectorizer` with its collaborators. This is possible because all of the collaborators are already implemented.

Service is basically our controller object that does all the IO and executes count_vectorizer that is doing all business logic calculations. It is desirable when core classes that are responsible for business logic calculations will use IO collaborators as little as possible, further making it more maintainable and testable.

There are several approaches in classical TDD to test controllers:

Use mocks. Usually classic TDD practitioners use hand written mocks – Spies for outgoing communications testing and Fake/Stubs/Dummies for incoming communications. Tests written with spies are implicitly or explicitly asserting on outgoing operations of sut on its collaborators. These assertions are coupling of the test to the implementation.
Use In memory DB. One advantage of in-memory DB is that it can be used for a long time during the development process, until a real DB is necessary. A disadvantage is that all required APIs of DB must be implemented.
Test controllers with integration tests. This approach is most resilient to refactoring, because there are no assumptions about the internal structure of the code. I personally prefer this approach for classes like Service.

Let’s look at Service with Spy:

Tests:

class SpyRepository:
    def __init__(self, documents):
        self._documents = documents
        self.save_called_with = None

    def save(self, documents):
        self.save_called_with = documents

    def text_documents(self):
        return self._documents

def create_vectorizer():
    tokenizer = Tokenizer()
    transformer = TokenTransformer()
    return CountVectorizer(transformer, tokenizer)

def create_service(documents):
    vectorizer = create_vectorizer()
    repository = SpyRepository(documents)
    service = Service(vectorizer, repository)
    return service, repository

def test_no_documents_in_repository():
    service, repository = create_service([])
    service.run()
    assert repository.save_called_with == []

def test_two_documents_in_db():
    service, repository = create_service(["first document", "second document"])
    service.run()
    assert repository.save_called_with == [[1, 1, 0], [0, 1, 1

Implementation:

class Service:
    def __init__(self, vectorizer, repository):
        self._vectorizer = vectorizer
        self._repository = repository

    def run(self):
        text_documents = self._repository.text_documents()
        encoded_documents = self._vectorizer.vectorize(text_documents)
        self._repository.save(encoded_documents)

Note, that the spy is not a full implementation of the DB, it implements only the required API and asserts that the methods are called with the correct parameters. This couples the tests to the implementation.

Now let’s look at Service with In Memory DB.

Tests:

class InMemoryRepository:
    def __init__(self):
        self._features = []
        self._documents = []

    def save_features(self, features):
        self._features.extend(features)

    def text_documents(self):
        return self._documents

    def features(self):
        return self._features

    def save_text_documents(self, documents):
        self._documents.extend(documents)

def create_vectorizer():
    tokenizer = Tokenizer()
    transformer = TokenTransformer()
    return CountVectorizer(transformer, tokenizer)

def create_service(documents):
    vectorizer = create_vectorizer()
    repository = InMemoryRepository()
    repository.save_text_documents(documents)
    service = Service(vectorizer, repository)
    return service, repository

def test_no_documents_in_repository():
    service, repository = create_service([])
    service.run()
    assert repository.features() == []

def test_two_documents_in_db():
    service, repository = create_service(["first document", "second document"])
    service.run()
    assert repository.features() == [[1, 1, 0], [0, 1, 1]]

There are no assertions on interactions in this case, however additional APIs of DB had to be implemented, to support inserting required data into DB.

Implementation is the same as with the Spy case.

Architectural decisions during the TDD process are usually very subjective. Bottom up TDD tends to add some unnecessary abstractions, however the resulting code is well decoupled and all source code dependencies are pointing from high level objects into lower business level objects, just like in our example. Count vectorizer has no knowledge about DB or other unimportant details, it contains only the business logic of the system.

Refactoring in bottom up

The purpose of this section is to show how refactoring of the code written in bottom up style may affect tests.

Let’s assume that we have a new requirement to add different types of encodings like TF3-IDF. To comply with the requirement I decided to change the transformer API but keep all other APIs intact. The reason we need to move creation of indexed tokens into transformers is because new encoding algorithms require knowledge of all corpus.

Note that we are not refactoring TokenTransformer, but we are actually changing functionality because we change the API of the class. We refactor the CountVectorizer because we are not changing its public API, only its interactions with collaborators.

New transformer tests (changes in bold):

TOKENIZED_DOCUMENTS = [['word0', 'word1', 'word2', 'word1'], ['word3', 'word4']]

def test_empty_document():
    sut = TokenTransformer()
    sut.create_indexed_tokens(TOKENIZED_DOCUMENTS)
    actual = sut.transform([])
    expected = [0, 0, 0, 0, 0]
    assert expected == actual

def test_unique_words_in_document():
    sut = TokenTransformer()
    sut.create_indexed_tokens(TOKENIZED_DOCUMENTS)
    actual = sut.transform(['word1', 'word2'])
    expected = [0, 1, 1, 0, 0]
    assert expected == actual

def test_nonunique_words_in_document():
    sut = TokenTransformer()
    sut.create_indexed_tokens(TOKENIZED_DOCUMENTS)
    actual = sut.transform(['word1', 'word2', 'word1', 'word2', 'word1'])
    expected = [0, 3, 2, 0, 0]
    assert expected == actual

The implementation:

New implementation requires to call `create_indexed_tokens` with full documents corpus before calling transform on one document or more.

class TokenTransformer:
    def __init__(self):
        self._indexed_tokens = None

    def create_indexed_tokens(self, tokenized_documents):
        indexed_tokens = []

        for document in tokenized_documents:
            for token in document:
                if not token in indexed_tokens:
                    indexed_tokens.append(token)
        self._indexed_tokens =  indexed_tokens

    def transform(self, tokenized_document):
        res = [0 for _ in self._indexed_tokens]

        for token in tokenized_document:
            token_index = self._indexed_tokens.index(token)
            res[token_index] += 1
        return res

CountVectorizer is no longer responsible for calculating the index of tokens. It only has to call a new method of transformer because it is now transformer’s responsibility.

Refactored CountVectorizer implementation:

class CountVectorizer:
    def __init__(self, transformer, tokenizer):
        self._transformer = transformer
        self._tokenizer = tokenizer

    def vectorize(self, text_documents):
        tokenized_documents = self._tokenize(text_documents)
        vectors = self._vectorize(tokenized_documents)
        return vectors

    def _tokenize(self, text_documents):
        return [self._tokenizer.tokenize(d) for d in text_documents]

    def _vectorize(self, tokenized_documents):
        self._transformer.create_indexed_tokens(tokenized_documents)
        return [self._transformer.transform(d) for d in tokenized_documents]

Very important note here – `CountVectorizer` tests are working as expected. The reason for that is that no assertion is made on collaborators communication, only output is asserted. This has great benefit, because this helps protect the class against refactoring. Note that the API of CountVectorizer is unchanged.

Top Down (London, also known as mockist)

Let’s look into the top down approach. In this approach, the first module that is being implemented is highest level. In our example app it is the Service class.

Because all collaborators are not implemented yet, the top down approach mocks ALL collaborators of the sut.

This is a very structured approach where the programmer goes down one layer at the time, mocking everything except sut. This approach implements only enough code to satisfy current requirements and thus it is very compliant with YAGNI. However it is more vulnerable for refactoring, as we will see shortly, because of heavy usage of mocks. In this example refactoring means changes to the APIs of internal collaborators. Contrary to the bottom up approach, design decisions in top down are made while writing the tests.

In the top down approach the first class is Service class. Because all its collaborators are not implemented yet, they are mocked. Usually in the top down method, a mocking framework is used to mock collaborators and mocks are not written manually. However this is not a hard rule.
In the example, I use standard python testing library unittest for mocking. There are also many alternatives to standard python unittest mocking with very advanced features, like testix or pymox.

Service

Tests:

def build_service(text_documents, encoded_documents):
    repository_mock = Mock()
    vectorizer_mock = Mock()
    repository_mock.text_documents.return_value = text_documents
    vectorizer_mock.vectorize.return_value = encoded_documents
    service = Service(vectorizer_mock, repository_mock)
    return service, repository_mock, vectorizer_mock

def test_service():
    service, repository_mock, vectorizer_mock = build_service(["doc1", "doc2"],[[1,0,0,1], [1, 1, 1, 4]])
    service.run()
    repository_mock.text_documents.assert_called_once()
    vectorizer_mock.vectorize.assert_called_once_with(["doc1", "doc2"])
    repository_mock.save.assert_called_once_with([[1,0,0,1], [1, 1, 1, 4]])

Note how assertion also checks that the interaction method is called only once, even more coupling tests to implementation. Some libraries even assert order of interactions. I am writing less tests here than in the bottom up approach to keep code short.

Implementation:

class Service:
    def __init__(self, vectorizer, repository):
        self._vectorizer = vectorizer
        self._repository = repository

    def run(self):
        text_documents = self._repository.text_documents()
        encoded_documents =\
        self._vectorizer.vectorize(text_documents)
        self._repository.save(encoded_documents)

CountVectorizer

Here both Tokenizer and Transformer are mocked because they are not implemented yet and all communications with the collaborators are asserted in tests. The test may look a bit cumbersome because I am not using advanced features of behavioral testing frameworks in python, like pymox or testix.

Tests:

def test_no_documents():
    transformer = Mock()
    tokenizer = Mock()
    sut = CountVectorizer(transformer, tokenizer)
    actual = sut.vectorize([])
    expected = []
    assert actual == expected

def test_single_document():
    transformer = Mock()
    tokenizer = Mock()

    tokenizer.tokenize.return_value = ['this', 'is', 'single', 'document', 'single', 'document']
    transformer.transform.return_value = [1, 1, 2, 2]

    sut = CountVectorizer(transformer, tokenizer)
    actual = sut.vectorize(["this is single document single document"])

    tokenizer.tokenize.assert_called_with("this is single document single document")
    transformer.transform.assert_called_with(['this', 'is', 'single', 'document', 'single', 'document'],
        ['this', 'is', 'single', 'document'])

    expected = [[1, 1, 2, 2]]
    assert actual == expected

def test_many_documents_document():
    transformer = Mock()
    tokenizer = Mock()

    tokenizer.tokenize.side_effect = [['this', 'is', 'single', 'document', 'single', 'document'],
        ['my', 'second', 'document'], ['my', 'third', 'document']]

    transformer.transform.side_effect = [[1, 1, 2, 2, 0, 0, 0],
        [0, 0, 0, 1, 1, 1, 0],
        [0, 0, 0, 1, 1, 0, 1]]

    sut = CountVectorizer(transformer, tokenizer)
    actual = sut.vectorize(["this is single document single document", "my second document", "my third document"])

    tokenizer.tokenize.called_with("this is single document single document")
    tokenizer.tokenize.called_with("my second document")
    tokenizer.tokenize.called_with("my third document")

    transformer.transform.called_with(['this', 'is', 'single', 'document', 'single', 'document'],
        ['this', 'is', 'single', 'document', 'my', 'second', 'third'])

    transformer.transform.called_with(['my', 'second', 'document'],
        ['this', 'is', 'single', 'document', 'my', 'second', 'third'])

    transformer.transform.called_with(['my', 'third', 'document'],
        ['this', 'is', 'single', 'document', 'my', 'second', 'third'])

    expected = [[1, 1, 2, 2, 0, 0, 0],
               [0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 1, 1, 0, 1]]
assert actual == expected

I keep the implementation the same as in bottom up approach for example purposes. In a real scenario, if requirements were only to support one type of tokenizer and transformer, CountVectorizer would have functions that performed the operations. If later new tokenizers/transformers were added to requirements, the classes would be extracted.

Would we change tests if we had several derivatives of Tokernizer and Transformer? Most likely no, because the main concern of the tests is behavior (interactions with collaborators). This is an interesting point. Top Down is, in some sense, more robust against API changes since those interactions are mocked. But on the flip-side, it doesn’t protect against such changes, as we will see in the refactor example.

Implementation:

class CountVectorizer:
    def __init__(self, transformer, tokenizer):
        self._transformer = transformer
        self._tokenizer = tokenizer

    def vectorize(self, text_documents):
        tokenized_documents = self._tokenize(text_documents)
        indexed_tokens = self._build_indexed_tokens(tokenized_documents)
        vectors = self._vectorize(tokenized_documents, indexed_tokens)
        return vectors

    def _tokenize(self, text_documents):
        return [self._tokenizer.tokenize(d) for d in text_documents]

    def _vectorize(self, tokenized_documents, indexed_tokens):
        return [self._transformer.transform(d, indexed_tokens) for d in tokenized_documents]

    def _build_indexed_tokens(self, tokenized_documents):
        indexed_tokens = []

        for document in tokenized_documents:
            for token in document:
                if not token in indexed_tokens:
                    indexed_tokens.append(token)
        return indexed_tokens

Tests and implementation of tokenizer and transformer are the same as in bottom up.

Refactoring in Top Down

Now let’s reproduce the refactoring scenario that we had in the bottom up approach. The new Transformer and refactored CountVectorizer are the same as in the bottom up. Tests of the transformer also require changes in the same fashion as in the bottom up case. However tests of CountVectorizer are now failing, unlike in the bottom up approach. This happens because CountVectorizer is asserting interactions of its collaborators and interactions with Transformer are now changed. In bottom up approach, no interactions are asserted, only final output. Note that while we added new features to Transformer, CountVectorizer was simply refactored.

CountVectorizer tests are failing:

Traceback (most recent call last):
  File "main.py", line 105, in <module>
    test_single_document()
  File "main.py", line 68, in test_single_document
    ['this', 'is', 'single', 'document'])
  File "/usr/lib64/python3.6/unittest/mock.py", line 814, in assert_called_with
    raise AssertionError(_error_message()) from cause
AssertionError: Expected call: transform(['this', 'is', 'single', 'document'], ['this', 'is', 'single', 'document'])
Actual call: transform(['this', 'is', 'single', 'document'])
Actual call: transform(['this', 'is', 'single', 'document'])

The reason for the failure is that the transform method now is called with only one argument. We have not only refactored Transformer tests, but also tests of Classes that use Transformer as a collaborator.

Refactored tests may look like that:

def test_no_documents():
    transformer = Mock()
    tokenizer = Mock()
    sut = CountVectorizer(transformer, tokenizer)
    actual = sut.vectorize([])
    expected = []
    assert actual == expected

def test_single_document():
    transformer = Mock()
    tokenizer = Mock()

    tokenizer.tokenize.return_value = ['this', 'is', 'single', 'document', 'single', 'document']
    transformer.transform.return_value = [1, 1, 2, 2]

    sut = CountVectorizer(transformer, tokenizer)
    actual = sut.vectorize(["this is single document single document"])

    tokenizer.tokenize.assert_called_with("this is single document single document")

    transformer.create_indexed_tokens.assert_called_with([['this', 'is', 'single', 'document', 'single', 'document']])
    transformer.transform.assert_called_with(['this', 'is', 'single', 'document', 'single', 'document'])

    expected = [[1, 1, 2, 2]]
    assert actual == expected

def test_many_documents_document():
    transformer = Mock()
    tokenizer = Mock()

    tokenizer.tokenize.side_effect = [['this', 'is', 'single', 'document', 'single', 'document'],
                                      ['my', 'second', 'document'],
                                      ['my', 'third', 'document']]

    transformer.transform.side_effect = [[1, 1, 2, 2, 0, 0, 0],
                                         [0, 0, 0, 1, 1, 1, 0],
                                         [0, 0, 0, 1, 1, 0, 1]]

    sut = CountVectorizer(transformer, tokenizer)
    actual = sut.vectorize(["this is single document single document",
                            "my second document",
                            "my third document"])

    tokenizer.tokenize.called_with("this is single document single document")
    tokenizer.tokenize.called_with("my second document")
    tokenizer.tokenize.called_with("my third document")

    transformer.create_indexed_tokens.assert_called_with([['this', 'is', 'single', 'document', 'single', 'document'], ['my', 'second', 'document'], ['my', 'third', 'document']])
    transformer.transform.called_with(['this', 'is', 'single', 'document', 'single', 'document'])
    transformer.transform.called_with(['my', 'second', 'document'])
    transformer.transform.called_with(['my', 'third', 'document'])

    expected = [[1, 1, 2, 2, 0, 0, 0],
                [0, 0, 0, 1, 1, 1, 0],
                [0, 0, 0, 1, 1, 0, 1]]

    assert actual == expected

Note how new assertion is made for new API of transformer create_indexed_tokens and how changed refactored api transform method.

Summary

So, which approach is better? As both methods have advantages and disadvantages, it is a subjective question. In my opinion, the correct question is not “which” but “when.”

When is Top Down beneficial?

When a programmer is writing a complex controller module, responsible for bringing all necessary pieces of data from out of process dependencies (like network or disk) and the controller is too complex (probably a sign of bad design) to test with only integration tests, the Top Down approach can be beneficial. Mocking all these out of process collaborators will be easier using a Top Down approach and no unnecessary complexity will be created. Basically, this approach is more concerned with interactions of collaborators (behavior) than output or state. If we had many derivatives of Tokenizers and Transformers in our application with stable API, the tests of CountVectorizer would not change. The tests wouldn’t even know about the existence of these derivatives. The tests care only about interactions.

When is Bottom Up beneficial?

When a programmer is writing core functionality of a business use case, usage of out of process dependencies should be kept to a minimum, because mocking collaborators is just making tests fragile, like we saw in the refactoring example. In that case using a Bottom Up approach will be more beneficial because tests will usually only assert output or state and no assertions will be made on collaborators’ interactions. Overall code written in a bottom up approach tends to be more functional, as the tests are only concerned with input data and output data of the sut.

Yuri Shafet is a security researcher and team lead at Rezilion. He would like to thank Yoav Kleinberger, Lior Zur-Lotan and Tal Kopel for their thoughtful reviews.

London or Chicago? It Depends on Many Refactors

Introduction

Why TDD?

TDD styles case study

Summary

Reduce your patching efforts by 85% or more in less than 10 minutes

Reduce your patching efforts by
85% or more in less than 10 minutes