Differences between revisions 8 and 9
Revision 8 as of 2016-09-09 22:31:43
Size: 7053
Editor: RichardDarst
Comment:
Revision 9 as of 2016-09-09 23:46:18
Size: 7251
Editor: RichardDarst
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
 * Although I am inconsistent, I try to use enclose strings in a single quote when it represents constants/machine readable things, and double quotes when it is human text (sentences or concepts).

These are my personal Python style extensions. Some of these come because of writing research code, where the goal is unclear from the start, and you can expect to be making more changes on a short term, instead of having a master plan and writing to it.

Style guidelines are useful and important, but after some time it is better to extend them to support a higher level of cognition, which can not be so easily written down. This also means that there are many times that these guidelines are *not* good, but the nature of advanced programming has art and aesthetic to it, not just following some rules. You have to make evaluations on when to use each guideline and when to not.

General

  • Local mess is better than global mess
    • For example, it's justified to put single-use imports next to the only function that uses it. But once you have somewhat final code, then clean up the mess. (If this sounds bad, it's only for cases where there's a miniscule chance of the code not being deleted or restructured anyway)
  • Although I am inconsistent, I try to use enclose strings in a single quote when it represents constants/machine readable things, and double quotes when it is human text (sentences or concepts).

Comments

  • Block comments start with '# ' (with a space), but when you comment out code, do not include a space. This provides a visual difference between comments intended for humans, and comments which are code lines.

    • # This is a comment.  It is distinguished from the line below,
      # that is commented out code, which might be uncommented someday,
      # or ideally removed once its purpose is done. 
      #a = 1
      b = 2
      c = 2
  • It is OK to have a large number of block comments to describe what is going on in the code.
  • If you have a block comment that applies to the following code, but not all the code before the next comment, you can "close" the comment with a single #:

    • # This is an tricky operation, and I want this comment
      # to apply to only the next two lines.  The single `#'
      # means this comment no longer applies.
      a = 5163
      b = 651
      #
      c = 513

Spacing

  • There is space around binary operators. Amount of space should reflect the general importance. Normal binary operators get space. Remove space between tightly binding things, or add space when needed:
    • a = 1 + 2
      b = 1*2 + 3
      c = 1 + 1   or   5 + 5
      ... for n in range(max(2,page_number-5), page_number)
  • Some things are generally "tightly binding", like
    • %-formatting for strings: print("filename: %s"%filename). This is considered one single string unit, not a binary operator.
  • The basic point is to aggressively add space to make the units easy to pick apart, usually in order of operator precedence (but when the semantic meaning is different, that takes presence).
  • Comprehensions get extra space compared to list/tuple/dict literals. These have a different cognitive meaning, and you immediately know what to think when you see the first character!
    • a = [1, 2, 3]
      b = [ x for x in range(10) ]
  • Vertical alignment is a extremely important part of readability! This makes multi-line parallelism in the code apparent, decreasing the cognitive effort needed to parse large blocks. Below is simple example, but a real example would be more complex.
    • # what does this do?  Are they all the same?
         _route_id = prefix + row['route_id'].decode(),
         _service_id = prefix + row['service_id'].decode(),
         trip_id = prefix + row['trip_id'].decode(),
         shape_id = prefix + row['shape_id'].decode() if row.get('shape_id','') else None,
         direction_id = row['direction_id'].decode() if row.get('direction_id','') else None,
         headsign = row['trip_headsign'].decode() if 'trip_headsign' in row else None,
      # this is much better.  The spaces between `] .` is pathological and
      # I usually would not do it, but it's justified sometimes.  Note the
      # extra space after the `=' in case a longer name comes someday.
         _route_id     = prefix + row['route_id']  .decode(),
         _service_id   = prefix + row['service_id'].decode(),
         trip_id       = prefix + row['trip_id']   .decode(),
         shape_id      = prefix + row['shape_id']  .decode() if row.get('shape_id','')      else None,
         direction_id  = row['direction_id'] .decode()       if row.get('direction_id','')  else None,
         headsign      = row['trip_headsign'].decode()       if 'trip_headsign' in row      else None,
    • Python style does not mention this anywhere
    • There is a major disadvantage: if you add something longer, then _every_ line needs changing. In this case, it is personal preference if you should try to keep them aligned, or have one line be unaligned and fix it all up later.
  • Spacing between functions can indicate their closeness of function. For example, if I am writing class methods, __getitem__, __setitem__, and __delitem__ may have zero spacing between them if they are all very short function. They are one cognitive unit, if I understand one then I understand all of them. If I am scrolling the file to find a section, I look after each \n\n\n, and there is no purpose in seeing each of these individually.

  • Empty objects have a space: [ ], { }, etc. There's no major justification for this, but they usually occur when there is plenty of space (on a single line) and it makes it more apparent.

Naming

  • "Code with an open thesaurus". Translation: think about what you name things when you make it, make sure it is sort and descriptive.
  • Too long names make things hard to read.
  • Too short names make things hard to read.
  • But balance these depending on a usage. A small placeholder can be a metavariable. For example, below you see the list comprehension variable is "x". This is a common pattern, so there's no need to think about what "x" is. The identity doesn't matter and the less you focus on that variable name, the faster you understand what is going on.
    • gtfs_files = [ x for x in gtfs_files if filter_function(x.slug) ]
    • This should only be used when a variable is highly-localized, such as only used in one or two immediately adjacent lines, and follows a common pattern.
  • Within a codebase, some concepts are so common that they get short names. For example, when django programming, you can use "qs" instead of "queryset", the cognitive mapping of "qs->queryset" becomes internalized.

  • When naming things a0, a1, a2, do you start from 0 or 1? Start from 0 when the first case is unique somehow (for example, it is lower bound which is subtracted from everything in a list), but if it is just an element, then a1 and a2 (for example, if you are comparing every pair of adjacent items in a list).

Structure

  • Merging things together on a line is OK when it forms a single thought:
    • if a:        continue
      if b and d:  continue
      if f:        continue
    • Not that the parallelism argument also applies here.
    • Also, this (and other similar patterns) are common so should be made one unique thought. You don't think "if a then b ; b is continue loop", but "loop restart condition: a"

Python/Style (last edited 2016-09-09 23:46:18 by RichardDarst)