Python Limitations and Design Flaws

Dr. Phillip M. Feldman


A. Introduction

Although I'm a Python evangelist, I must admit that Python is not without significant limitations. In the following, I critique Python solely from the perspective of an engineering and scientific application developer, rather than, e.g., from that of a computer science educator (I gave up that line of work in 1994). Also, I treat Python, NumPy, SciPy, and other widely-used open-source Python packages as a whole, which is perhaps slightly unfair to Python. I view the most serious limitations of Python—in no particular order—as the following:

B. Eleven Limitations of Python

B.1 Package Availability and Maturity

Several Matlab toolboxes have no Python counterparts.

Many Python modules/packages are in an immature state of development and/or are poorly documented and/or poorly supported. (The lack of good support is not surprising given that much of this is coming from volunteers who are donating their time). It is worth investigating whether a given module/package is actively maintained before developing a dependence on it; otherwise, one may find oneself in the position of having to devise workarounds and patches for that code!

For scientific and engineering work, NumPy, SciPy, and matplotlib are arguably the most important Python packages. NumPy and matplotlib are generally well documented, but SciPy documentation is often unclear or simply missing.

Here is a specific example. scipy.interpolate.LSQUnivariateSpline fits a smoothing spline to data. The documentation for the method get_coeffs() does not explain the meaning of the coefficients returned by the method. (Given that the method returns fewer coefficients than one would expect, the lack of documentation is problematic).

B.2 matplotlib Plotting Package

matplotlib is a very capable package for non-interactive plotting, but there are some significant problems. Three themes are common to most of the issues with this package:

  1. There is in general a lack of uniformity among the interfaces to the various functions and methods. Here's a concrete example: When one generates a text box using the pyplot.annotate function or the axes object's annotate method, the xycoords keyword can be used to specify whether the text location is specified in terms of data coordinates, axes fractional coordinates, or figure fractional coordinates. When using the pyplot.text function, on the other hand, there is no xycoords keyword, and the text location can only be specified in data coordinates, which is typically not what is desired.

  2. Default behaviors are often not sensible, with the result that producing simple but professional-looking plots tends to require more fiddling than should be required. Here are three examples:

    1. By default, the margin area of a figure has a dark gray background color, and titles, axes labels, and other text are black. Black against a dark gray background produces poor contrast. One can easily solve this problem by changing the margin background color ('facecolor') from dark gray to white, as shown in the following snippet of code, but this requires the additional code marked in red.

      fig= pyplot.figure( facecolor=[1, 1, 1] )

    2. Plot titles are by default jammed against the upper edge of the plot. Fixing this requires the additional code shown in red in the following snippet:

      axes.set_title('text' , y=1.02 )

    3. When one produces a polar contour plot, the 'y' axis (circumferential axis) labels are by default displayed in degrees while the 'x' axis (radial axis) labels are displayed in radians! A sample plot illustrating this behavior appears below.

      Figure 1: matplotlib mixes degrees and radians
      This plot was generated via the script inconsistent_angle_units.py.
  3. The interface puts too much burden on the programmer's memory (and on references). Here are a few examples:

    1. Although Python and most Python packages follow the computer science convention of counting from zero, matplotlib counts subplots starting from one.

    2. The functions contour and contourf can be called with up to four positional arguments, with the interpretation of the fourth argument depending on its type. This excessive use of positional arguments and type-dependent behaviors, which is reminiscent of Matlab, creates the potential for confusion. Use of alternative keyword arguments (with suitable error checks for the presence of conflicting keywords) would be a better design.

    3. To annotate a figure containing multiple subplots, with the annotation location specified in figure fractional coordinates, one must choose a subplot at random and apply the annotation to it. I had expected that I would need to invoke an annotate method of the figure object, but there is no such method. I suspect that most programmers would find this behavior confusing.

B.3 Named Constants

There is no mechanism for defining a named constant in Python. (When one defines a constant in a language such as C++, this instructs the compiler that any attempt to change the value should be treated as an error).

B.4 Nested Loop Flow Control

There is no clean way to break out of two or more nested loops. One must do one of the following:

B.5 Limitations of scipy.signal

scipy.signal is of limited utility for engineering applications. I'm going to focus here on what I see as one of the key limitations—the filtering functionality. I'd like to see the following:
  1. support for lowpass, bandpass, and bandstop FIR filter design, with the user specifying (a) the passband ripple, (b) the minimum stopband rejection, and (c) the design method, with this last being optional. Rather than forcing the user to specify the order of the filter, which requires many iterations to determine the minimum filter order that will do the job, the code should automatically determine the minimum filter order that can meet the user's specs.

  2. support for fixed-binary-point arithmetic.

  3. support for filtering and the design of filters that use fixed-binary-point arithmetic.

Such changes would be a big step in the direction of making Python+NumPy+SciPy a viable alternative to Matlab + the Matlab Signal Processing Toolbox.

As an aside, I'd like to comment on the documentation for `scipy.signal.kaiserord`, which says the following:

scipy.signal.kaiserord(ripple, width)[source]

<snip>
ripple : float

Positive number specifying maximum ripple in passband (dB) and minimum ripple in stopband.

When designing a lowpass digital filter, one normally specifies the maximum ripple in the passband and the minimum rejection in the stopband. With this function, there is no way to specify how much rejection one gets in the stopband, and the filter design code is also apparently trying to limit stopband ripple, which is something that no engineer would care about. The documentation can't just be badly worded, because there would have to be another parameter to specify the stopband rejection.

B.6 Support for Microsoft Excel

Support for reading and writing of Microsoft Excel files is unsatisfactory. In particular:

Footnote: Someone I respect opined that "Using Excel is the problem here". For better or worse, Excel is ubiquitous in engineering organizations, and one cannot always dictate the format in which data to be analyzed is provided.

B.7 Mathematical Optimization

SciPy includes the scipy.optimize package for performing optimization. There are numerous issues with this package:

  1. Several of the algorithms are rather dated. For example, it appears that SciPy implements the original, 1965 version of the Nelder-Mead algorithm. This version may fail to converge or converge to a non-solution. And even when it does converge to a valid solution, convergence may be very slow. An article by Saša Singer and John Nelder discusses some of these issues and proposes a more efficient version of the algorithm.

  2. The package needs a uniform mechanism for specifying termination conditions (iteration limits and tolerances). The lack of this makes experimentation with alternative optimization algorithms more cumbersome.

  3. scipy.optimize does not provide any capabilities for dividing work over multiple cores in a single computer, or over multiple nodes in a cluster of computers.

  4. The scipy.optimize.brute solver implements a brute force grid search. This is useful, but there are situations in which one does not a priori know what grid spacing to use, and would like to let the solver continue to search until a solution of some minimal quality is found. Something like this can be implemented using subrandom numbers, which cover the domain of interest quickly and evenly without requiring advance specification of the number of points. (See, e.g., the WikiPedia article on subrandom numbers).

  5. When investigating a class of related optimization problems, it is sometimes important to be able to determine whether multiple local minima typically occur, and if so, how accurately the starting point of the search needs to be specified for a given optimization algorithm to be able to converge to the global minimum. Currently, scipy.optimize.brute cannot be used to study such problems because its finishing search (stage-2) search) is performed using only the best result from the initial (stage-1) search. An option to perform a finishing search for a specified fraction of the grid points, with output including statistics on the number of distinct local minima found, would make this function far more useful.

  6. The addition of one or more genetic optimization algorithms would be welcome.

  7. The scipy.optimize.brute solver permits one to combine brute force grid search with a second stage of optimization, but there is no mechanism for passing termination conditions or other options to the second-stage optimizer.

The mystic package, of which Mike McKerns is the primary developer, appears to offer many advantages over scipy.optimize, but is not currently well enough documented to represent a useable alternative.

B.8 Reading of Configuration Files (ini Files)

Python's Standard Library provides a module called ConfigParser for extracting parameter values (e.g., simulation model parameters) from ini (configuation) files, but this module was not well designed and is not a practical tool for serious simulation projects. A few specific issues are the following:

B.9 Operations on Arrays of Strings

NumPy supports arrays of strings, but such arrays suffer from special disabilities. Because NumPy's .min() and .max() methods work for numeric arrays, and Python's min() and max() functions work for strings, one might reasonably expect NumPy's .min() and .max() methods to work for arrays of strings, but they don't, as the following iPython session demonstrates:

In [1]: array([['dd', 'de', 'cc'], ['ae', 'be', 'hf']]).max(axis=0)
TypeError: cannot perform reduce with flexible type

B.10 Block Comments

Python provides no true block comments. A consequence is that there is no mechanism that is both fast and bullet-proof for temporarily deactivating a large block of code—one must either laboriously comment out each line, or simply delete the block. Support for block comments should be added, with the implementation allowing a block comment to contain ordinary comments and/or nested block comments.

Footnote: Python supports strings that span multiple lines; these are delimited by triple quotes. It is sometimes claimed that tripled-quoted strings can be used for block commenting, but this is problematic. To see why, suppose that one wishes to comment out lines 20 through 50, and that lines 30 through 40 are a triple quoted string. Enclosing lines 20 through 50 in triple quotes will convert lines 20 through 30 and lines 40 through 50 into triple-quoted strings, but the contents of lines 30 through 40 will no longer be recognized as a tripled-quoted string. What's needed is a pair of markers that unambiguously indicate the start and end of a block comment, e.g., #* and *#. (The Haskell programming language uses {- and -}).

B.11 The Python Interpreter Allows Statements that Make No Sense

The Python interpreter allows some statements that a reasonable observer could only interpret as mistakes. The following are just two examples.

Example #1: d= {'a':1, 'a':2} is equivalent to d= {'a':2}.

Example #2: Suppose that the variable a has some scalar value. Aside from consuming a few CPU cycles, the statement a == 4 does absolutely nothing. Clearly the person who coded this meant to either assign a or do something with the result of the comparison. In either case, the statement as coded should be treated as invalid.

C. News Items

Oct. 21, 2017: I've been griping to Enthought for a couple of years about the issues with interpreter and package configuration control (see item 1 in the Archive section below), and am very pleased to be able to report that they listened. I've been playing with the recently-released Enthought Deployment Manager (EDM), and my preliminary opinion is that it provides a powerful and easy-to-use toolset for managing multiple configurations of the interpreter and packages.

EDM allows one to create multiple environments, each with its own version of the interpreter and a completely independent set of package files. If one updates package X, and that update depends on updated versions of packages Y and Z, Y and Z are automatically upgraded as well. If one downgrades, the same thing works in the opposite direction.

[Canopy 1.X used Python's built-in virtual environment facility for user Python environments. This model worked well for pure Python packages, but tended to be fragile for extension packages (binary libraries), and also required a high level of coordination between the maintainers of base and child environments. For example, NumPy with MKL was fragile, depending on the sequence and location of update events.]

D. Archive

This section is a repository for limitations that have been addressed.

Currently, there is only one item here, but I'm hopeful that there will be more over time.

1. Interpreter and Package Version Control

All non-trivial Python applications depend on packages (or modules) that are not part of the top-level script. Although Python handles package imports in a much cleaner fashion than some other languages (e.g., Matlab), there are still issues that tend to frequently bedevil Python developers:

  1. If application X depends on one version of a package, while application Y depends on a different version, there is currently no clean way to handle this. It would be great if there were some way to specify the version of a package that is to be imported, or an acceptable range of version numbers. I'm not sure in practice how this would be implemented; it could potentially use a scheme similar to that employed by the Windows OS for the registration of DLLs.

  2. Because one can discover at most one missing dependency per execution, tracking down dependencies by running a script can be a time-consuming proposition. I would love to have a tool that automatically analyzes a Python script or package, determines all dependencies, and reports all missing packages. As far as I know, nothing like this exists.

[Re. item 1. above: When one performs an ordinary Python import, the sequence of folders specified via the PYTHONPATH environment variable is searched for the requested module. One can override this path by modifying the contents of sys.path, which is initialized based on PYTHONPATH, but this is bad practice. One can also use my CustomImporter.py module, which allows one to import a module from a specific location, but this code was not created to solve the problem of version-specific dependencies and provides at best a clumsy workaround.]

Last update: 21 Oct., 2017