Why I wouldn't use a dynamically typed language for large applications

April 25, 2014 topic: Design+Architecture · Musing tagged: C++ · Python

Currently I’m developing for and maintaining a large code base that’s a mix of C++ and Python code. In this article I write about my experience about the type systems of statically typed C++ and dynamically typed Python, especially about the situations where I miss the explicit type information in dynamically typed languages.

I use Python as synonym for dynamically typed languages because my experiences with python should also apply for other dynamically typed languages.

I want to state that i really like python. I use python every day and it’s my language of choice for small scripts to medium programs. It’s standard library is phenomenal and python is a great tool for making quick prototypes, tools etc.

But now lets come to the drawbacks when used for bigger applications:

Refactoring and maintaining Python code is significantly more difficult and time-consuming than doing the same with a language with a static type system like Java or C++.

Let me explain why …

Understanding an unknown code base is hard - dynamically typed languages make it even harder

In the last months one of my main tasks was maintenance: debugging, fixing a tons of bugs, making some necessary changes, in both, the C++ and Python part of the application. I have to say that I’m NOT the author of 99% of the code base, so the majority of the effort I had to spend the last months was related to understanding code i did not know. In this situation I constitute the non-existent type information in dynamically typed languages a major disadvantage. I found it way harder to understand an unknown code base in Python, just because there was no type information for variables and function prototypes compared to the statically typed C++ code base.

Let me bring a simple example that illustrates this problem: Given the function prototype int add(int x, int y) in a statically typed language like C++ - it’s clear what this function does just by reading the prototype. The static type annotations are documentation that helps to understand the code. And this “documentation” is always in sync with the code - the compiler will tell you if it is not.

In a dynamically typed language the function prototype would be add(x,y). When I read this line in Python, I can guess in the context or i have to look at the (hopefully existing) documentation of the function or directly at the code to determine what the function does. The drawback of missing type information get even more evident if the methods or functions take objects as arguments that are part of a complex class hierarchy: IEntity activate(IEntity entity) contains more information than activate(entity). What is “entity” in the later case? It can be anything!

In statically explicitly typed languages, the function prototype IS the interface definition.

In dynamically typed languages, the function prototype is an INCOMPLETE interface definition.

So for the task of understanding existing code, explicit type information is a powerful tool (and not just overhead like often stated). Dynamically typed languages don’t provide this feature. As a result, it’s harder to understand a larger dynamically typed codebase.

Refactoring is hard - dynamically typed languages make it even harder

Static type information also helps when refactoring code. I don’t just speak of automatic refactoring where the IDE support for static languages is usually better, but also about the manual approach of moving code around, renaming functions and variables etc.

When you make any (non logical) error during refactoring in a statically typed language, the chances are high that this error is detected by your compiler, or even before compilation by the syntax or semantic highlighting capabilities of your IDE. A good IDE for dynamically typed languages can also detect some of the errors, but you also have to execute the code to be sure it still works. Unit tests do help in this case. But usually unit tests check behavior and not for type errors, so the coverage of unit tests for detecting type errors is less compared to compilation. It gets worse if no unit tests exist at all (sadly this is very often reality). In this case you must run the application (please don’t tell me I should better write comprehensive tests for legacy code) and make sure to manually invoke the actions that cover the refactored code. Depending on the application, this can be really time-consuming or even impracticable.

Explicit types for function prototypes clearly specify the expected input and output. The compiler automatically checks this specification against the implementation (it does a formal verification). A cool feature in my opinion. This checks can catch a lot of the errors you make when refactoring/modifying existing code.

Dynamically typed languages are missing this feature. As a result, it takes way more effort to get refactored dynamically typed code right.

Conclusion

I think a year ago I would have chosen Python for almost any application (where performance and timing is not an issue). Now that I have made some painful experiences with maintaining a large Python code base I think differently. If I would have to build a large application from scratch, I would not use a dynamic language as a primary language if dynamic runtime behavior is not a fundamental requirement. When it comes to maintaining a large code base, I view static typing as an important feature for programmers. Of course, dynamic typing adds additional flexibility but also additional complexity that can easily outperform its benefits when the code base grows, ages and a lot different people touch the code.

I don’t claim that Python is generally not suitable for large applications - there are obviously large applications successfully built with Python. But maybe they suffer from the same problems and wish they had chosen a language that supports explicit static typing. I don’t know. Strict project guidelines like defensive programming, rigorously apply test driven development and strict naming conventions might at least help to alleviate some of the disadvantages of dynamic typing.

Python will remain my language of choice for small to medium-sized applications and scripts, but for large applications I have better options - the right tool for the right job.

This article is based on my personal state of knowledge and experience - if you don’t agree, let me know. I appreciate any feedback. Thanks for reading.