Several Matlab toolboxes have no Python counterparts.
Many Python modules/packages are in an immature state of development and/or are poorly documented and/or poorly supported. (The lack of good support is not surprising given that much of this is coming from volunteers who are donating their time). It is worth investigating whether a given module/package is actively maintained before developing a dependence on it; otherwise, one may find oneself in the position of having to devise workarounds and patches for that code!
For scientific and engineering work,
matplotlib are arguably the most important Python packages.
matplotlib are generally well documented,
SciPy documentation is often unclear or simply missing.
Here is a specific example.
scipy.interpolate.LSQUnivariateSpline fits a smoothing spline to
data. The documentation for the method
get_coeffs() does not
explain the meaning of the coefficients returned by the method. (Given that the
method returns fewer coefficients than one would expect, the lack of
documentation is problematic).
matplotlib is a very capable package for non-interactive
plotting, but there are some significant problems. Three themes are common to
most of the issues with this package:
There is in general a lack of uniformity among the interfaces to the
various functions and methods. Here's a concrete example: When one generates a
text box using the
pyplot.annotate function or the
annotate method, the
xycoords keyword can be used to specify whether the text location
is specified in terms of data coordinates, axes fractional coordinates, or
figure fractional coordinates. When using the
on the other hand, there is no
xycoords keyword, and the text
location can only be specified in data coordinates, which is typically not what
Default behaviors are often not sensible, with the result that producing simple but professional-looking plots tends to require more fiddling than should be required. Here are three examples:
By default, the margin area of a figure has a dark gray background color, and titles, axes labels, and other text are black. Black against a dark gray background produces poor contrast. One can easily solve this problem by changing the margin background color ('facecolor') from dark gray to white, as shown in the following snippet of code, but this requires the additional code marked in red.
facecolor=[1, 1, 1]
Plot titles are by default jammed against the upper edge of the plot. Fixing this requires the additional code shown in red in the following snippet:
When one produces a polar contour plot, the 'y' axis (circumferential
axis) labels are by default displayed in degrees while the 'x' axis (radial
axis) labels are displayed in radians! A sample plot illustrating this
behavior appears below.
This plot was generated via the script
The interface puts too much burden on the programmer's memory (and on references). Here are a few examples:
Although Python and most Python packages follow the computer science convention of counting from zero, matplotlib counts subplots starting from one.
contourf can be
called with up to four positional arguments, with the interpretation of the
fourth argument depending on its type. This excessive use of positional
arguments and type-dependent behaviors, which is reminiscent of Matlab,
creates the potential for confusion. Use of alternative keyword arguments
(with suitable error checks for the presence of conflicting keywords) would be
a better design.
To annotate a figure containing multiple subplots, with the annotation
location specified in figure fractional coordinates, one must choose a subplot
at random and apply the annotation to it. I had expected that I would need
to invoke an
annotate method of the
but there is no such method. I suspect that most programmers would find this
There is no mechanism for defining a named constant in Python. (When one defines a constant in a language such as C++, this instructs the compiler that any attempt to change the value should be treated as an error).
There is no clean way to break out of two or more nested loops. One must do one of the following:
Set a flag before breaking out of the first loop, and then test that flag in each of the outer loops.
Use exception handling.
Move the nested loops to be broken out of into a function, and use the
return statement to terminate the function.
Support for reading and writing of Microsoft Excel files is unsatisfactory. In particular:
Although a single package—
xlrd—can read both
the older 2000/2003 and the newer 2007+ Excel file formats, separate packages
with different interfaces must be used to write Excel files in these two
openpyxl packages can be used
to write Excel files in the older and newer formats, respectively. As of this
writing, I know of no open source Python code that answers the need for a
single interface that works transparently across both of these formats.
When working with Excel files containing column-organized data, one may
wish to extract data from columns whose numbers are not a priori known,
identifying the desired columns by strings appearing in the header row rather
than by numbers.
openpyxl do not support
As of this writing,
openpyxl do not
permit one to read the formulas from a workbook or write a workbook containing
formulas. (If one reads a workbook and writes it out again, all formulas in
all worksheets are deleted).
When an Excel file is loaded into memory using
openpyxl, there is no mechanism for determining the memory
occupied by the associated Python data structure. (Excel 2007/2010 files are
compressed, and the memory footprint may thus be much greater than the size of
the file on disk). The inability to determine the occupied memory is
problematic for file caching systems, which almost always operate with a
specified total memory limit, and thus must be able to determine the memory
footprint of each item in the cache.
openpyxl tend to innundate the user
with a flood of nuisance messages.
If an Excel 2007/2010 workbook contains chart sheets,
openpyxl gets confused about the names and indices of the
sheets, so that a request for data from one sheet may produce data from a
It appears that the
openpyxl package is no longer
Footnote: Someone I respect opined that "Using Excel is the problem here". For better or worse, Excel is ubiquitous in engineering organizations, and one cannot always dictate the format in which data to be analyzed is provided.
SciPy includes the
scipy.optimize package for performing
optimization. There are numerous issues with this package:
Several of the algorithms are rather dated. For example, it appears that SciPy implements the original, 1965 version of the Nelder-Mead algorithm. This version may fail to converge or converge to a non-solution. And even when it does converge to a valid solution, convergence may be very slow. An article by Saša Singer and John Nelder discusses some of these issues and proposes a more efficient version of the algorithm.
The package needs a uniform mechanism for specifying termination conditions (iteration limits and tolerances). The lack of this makes experimentation with alternative optimization algorithms more cumbersome.
The constrained solvers such as
scipy.optimize.fmin_tnc accept only "box" constraints, i.e.,
limits on individual variables. The addition of algorithms that accept
general linear and nonlinear constraints would be welcome.
scipy.optimize does not provide any capabilities for
dividing work over multiple cores in a single computer, or over multiple
nodes in a cluster of computers.
scipy.optimize.brute solver implements a brute force
grid search. This is useful, but there are situations in which one does not
a priori know what grid spacing to use, and would like to let the solver
continue to search until a solution of some minimal quality is found.
Something like this can be implemented using subrandom numbers, which
cover the domain of interest quickly and evenly without requiring advance
specification of the number of points. (See, e.g.,
article on subrandom numbers).
When investigating a class of related optimization problems, it is
sometimes important to be able to determine whether multiple local minima
typically occur, and if so, how accurately the starting point of the search
needs to be specified for a given optimization algorithm to be able to
converge to the global minimum. Currently,
cannot be used to study such problems because its finishing search
(stage-2) search) is performed using only the best result from the initial
(stage-1) search. An option to perform a finishing search for a specified
fraction of the grid points, with output including statistics on the number
of distinct local minima found, would make this function far more useful.
The addition of one or more genetic optimization algorithms would be welcome.
scipy.optimize.brute solver permits one to combine
brute force grid search with a second stage of optimization, but there is
no mechanism for passing termination conditions or other options to the
mystic package, of which Mike McKerns is the primary
developer, appears to offer many advantages over
but is not currently well enough documented to provide an adequate alternative.
All non-trivial Python applications depend on packages (or modules) that are not part of the top-level script. Although Python handles package imports in a much cleaner fashion than some other languages (e.g., Matlab), there are still issues that tend to frequently bedevil Python developers:
If application X depends on one version of a package, while application Y depends on a different version, there is currently no clean way to handle this. It would be great if there were some way to specify the version of a package that is to be imported, or an acceptable range of version numbers. I'm not sure in practice how this would be implemented; it could potentially use a scheme similar to that employed by the Windows OS for the registration of DLLs.
Because one can discover at most one missing dependency per execution, tracking down dependencies by running a script can be a time-consuming proposition. I would love to have a tool that automatically analyzes a Python script or package, determines all dependencies, and reports all missing packages. As far as I know, nothing like this exists.
import, the sequence of folders specified via the PYTHONPATH environment variable is searched for the requested module. One can override this path by modifying the contents of
sys.path, which is initialized based on PYTHONPATH, but this is bad practice. One can also use my
CustomImporter.pymodule, which allows one to import a module from a specific location, but this code was not created to solve the problem of version-specific dependencies and provides at best a clumsy workaround.]
Python's Standard Library provides a module called
ConfigParser for extracting parameter values (e.g., simulation
model parameters) from ini (configuation) files, but this module was not
well designed and is not a practical tool for serious simulation projects. A
few specific issues are the following:
The definition of model parameters and processing of the ini file have been conflated together. These are logically separate steps, i.e., there should be one call to define a specific parameter, and another call to parse the ini file and recover all parameter values appearing in a given section. The parameter definition call would specify the parameter name, the data type, and optionally a default value and help text to be displayed on user request.
To understand the benefit of the proposed design, suppose that there are m parameters in a given model and that one wants to recover the parameter values appearing in n sections of the ini file. With the present design, one needs m times n calls. With the proposed design, one would need m plus n calls. (There would be no need to repeat parameter definitions unless for some reason these change from one section of the ini file to the next, which would be strange).
There is no mechanism for data validation, i.e., for specifying a condition that a given parameter must satisfy. If, for example, an integer parameter represents the number of customers waiting for service, one would like to be able to specify as part of the parameter definition that a positive value is required.
There is no complex number parameter type.
There is no list parameter type. (One would also need to be able to specify the allowed types of objects that a given list can contain; these could be viewed as sub-types).
There is no mechanism for defining help text to be displayed on user request.
NumPy supports arrays of strings, but such arrays suffer from special
disabilities. Because NumPy's
methods work for numeric arrays, and Python's
max() functions work for strings, one might reasonably expect
.max() methods to work for arrays
of strings, but they don't, as the following
In : array([['dd', 'de', 'cc'], ['ae', 'be', 'hf']]).max(axis=0) TypeError: cannot perform reduce with flexible type
Python provides no true block comments. A consequence is that there is no mechanism that is both fast and bullet-proof for temporarily deactivating a large block of code—one must either laboriously comment out each line, or simply delete the block. Support for block comments should be added, with the implementation allowing a block comment to contain ordinary comments and/or nested block comments.
Footnote #1: Python supports strings that span multiple lines; these are
delimited by triple quotes. It is sometimes claimed that tripled-quoted strings
can be used for block commenting, but this is problematic. To see why, suppose
that one wishes to comment out lines 20 through 50, and that lines 30 through 40
are a triple quoted string. Enclosing lines 20 through 50 in triple quotes will
convert lines 20 through 30 and lines 40 through 50 into triple-quoted strings,
but the contents of lines 30 through 40 will no longer be recognized as a
tripled-quoted string. What's needed is a pair of markers that unambiguously
indicate the start and end of a block comment, e.g.,
*#. (The Haskell programming language uses
Footnote #2: Benjamin Root has proposed alternating between double and single triple-quoting, as in the following example:
>>> a= """ ... foobar ... ''' ... another foobar ... ''' ... """ >>> a "\nfoobar\n '''\n another foobar\n '''\n"
This idea sounds good, but fails if the code to be commented out contains both
types of triple quotes.
Last update: 20 Feb., 2016