On Jan 7, 2013, at 8:49 AM, Simeon Franklin <[address removed]> wrote:
I'm wondering if there are any language specific anti-patterns in
Python I should be aware of. I'm not thinking things like choosing the
right algorithm or general observations (don't use an ORM) but
wondering if there are any language level features that may not
I can think of legacy issues that are mostly fixed: string
concatenation is now optimized in CPython AFAIK (although I still tend
to join arrays of strings) and there is no need for DSU when
sorting... But for current versions of Python the only things that
jump to mind are observations that hold true for most dynamic
languages: minimize dot lookups, don't overuse objects, etc. Are there
any typical Python gotcha's that I'm missing?
Your feedback is appreciated!
I think the most prevalent performance antipattern in modern Python is continuing to care about CPython's performance, rather than optimizing for PyPy :). PyPy is significantly faster, and also tends to perform better with clearly-expressed, straightforward Python code, whereas CPython's performance gotchas are all things that require weird distortions of your original intent.
Once you've optimized your code at an algorithmic level, switching to PyPy will likely dwarf any benefit you might get from mangling all your code to suit your preferred version of CPython's idiosyncrasies.
For example, one piece of performance advice that crops up every so often is "avoid function calls, they're slow". Of course, you can't simply avoid one thing, you have to replace it with something else to accomplish the same purpose. So what to replace it with? Usually a big hairy ball of manually-inlined code. PyPy optimizes by finding small loops and then JITing them. If you manually inline lots of code, that actually makes it *harder* for your code to get automatically JITed.
Another is "use tuples instead of classes, dot lookups are slow". PyPy can automatically turn a class into something like a C struct. I don't know if this is faster than using a tuple, but it's definitely fast enough that it's not worth killing readability by changing 'count = self.width * self.height' 'count = info * info'
Another popular bit of advice is "just write it in C if you think it might be slow". In PyPy, it's actually faster in many cases to just write things in Python than to pay the penalty of the FFI transition into C and back again.
As always, the main performance lesson is profile before you do anything. The main antipattern across all languages and all eras is to assume you know how a system will perform before you've tried measuring it.