1. confusing operations
This section compares some of Python’s more confusing operations.
1.1 Random sampling with and without put-back
import random
random.choices(seq, k=1) # list of length k, with put-back sampling
random.sample(seq, k) # list of length k, without put-back sampling
1.2 Parameters of the lambda function
func = lambda y: x + y # The value of x is bound at function runtime
func = lambda y, x=x: x + y # The value of x is bound at function definition time
1.3 copy and deepcopy
import copy
y = copy.copy(x) # only the topmost level is copied
y = copy.deepcopy(x) # Copy all nested parts
Copy and variable aliasing are confusing when combined with.
a = [1, 2, [3, 4]]
# Alias.
b_alias = a
assert b_alias == a and b_alias is a
# Shallow copy.
b_shallow_copy = a[:]
assert b_shallow_copy == a and b_shallow_copy is not a and b_shallow_copy[2] is a[2]
# Deep copy.
import copy
b_deep_copy = copy.deepcopy(a)
assert b_deep_copy == a and b_deep_copy is not a and b_deep_copy[2] is not a[2]
Changes to the alias affect the original variable. The elements in the (shallow) copy are aliases of the elements in the original list, while the deep copy is made recursively, and changes to the deep copy do not affect the original variable.
1.4 == and is
x == y # whether the two references have the same value
x is y # whether the two references point to the same object
1.5 Determining the type
type(a) == int # Ignore polymorphic features in object-oriented design
isinstance(a, int) # takes into account the polymorphic features of object-oriented design
1.6 String Search
str.find(sub, start=None, end=None); str.rfind(...) # Return -1 if not found
str.index(sub, start=None, end=None); str.rindex(...) # Throw a ValueError exception if not found
1.7 List backward indexing
This is just a matter of habit, forward indexing when the subscript starts from 0, if the reverse index also want to start from 0 can use ~.
print(a[-1], a[-2], a[-3])
print(a[~0], a[~1], a[~2])
2. C/C++ User’s Guide
Many Python users migrated from C/C++, and there are some differences in syntax and code style between the two languages, which are briefly described in this section.
2.1 Very Large Numbers and Very Small Numbers
Whereas the C/C++ convention is to define a very large number, Python has inf and -inf
a = float('inf')
b = float('-inf')
2.2 Boolean values
While the C/C++ convention is to use 0 and non-0 values for True and False, Python recommends using True and False directly for Boolean values.
a = True
b = False
2.3 Determining Null
The C/C++ convention for null pointers is if (a)
and if (!a)
; Python for None
is
if x is None:s
pass
If you use if not x
, you will treat all other objects (such as strings of length 0, lists, tuples, dictionaries, etc.) as False.
2.4 Swapping values
The C/C++ convention is to define a temporary variable that can be used to swap values. With Python’s Tuple operation, you can do this in one step.
a, b = b, a
2.5 Comparing
The C/C++ convention is to use two conditions. With Python, you can do this in one step.
if 0 < a < 5:
pass
2.6 Set and Get Class Members
The C/C++ convention is to set class members to private and access their values through a series of Set and Get functions. While it is possible to set the corresponding Set and Get functions in Python via @property
, @setter
, and @deleter
, we should avoid unnecessary abstraction, which can be 4 - 5 times slower than direct access.
2.7 Input and output parameters of functions
It is customary in C/C++ to list both input and output parameters as arguments to a function, and to change the value of the output parameter via a pointer. The return value of a function is the execution state, and the function caller checks the return value to determine whether it was successfully executed. In Python, there is no need for the function caller to check the return value, and the function throws an exception directly when it encounters a special case.
2.8 Reading Files
Reading a file in Python is much simpler than in C/C++. The opened file is an iterable object that returns one line at a time.
with open(file_path, 'rt', encoding='utf-8') as f:
for line in f:
print(line) # The \n at the end is preserved
2.9 File path splicing
Python’s os.path.join
automatically adds a /
or \
separator between paths, depending on the operating system.
import os
os.path.join('usr', 'lib', 'local')
2.10 Parsing command-line options
While Python can use sys.argv to parse command-line options directly, as in C/C++, the ArgumentParser utility under argparse is more convenient and powerful.
2.11 Calling External Commands
While Python can use os.system
to invoke external commands directly, as in C/C++, you can use subprocess.check_output
to freely choose whether to execute the shell or not, and to get the results of external command execution.
import subprocess
# If the external command returns a non-zero value, throw a subprocess.CalledProcessError exception
result = subprocess.check_output(['cmd', 'arg1', 'arg2']).decode('utf-8')
# Collect both standard output and standard errors
result = subprocess.check_output(['cmd', 'arg1', 'arg2'], stderr=subprocess.STDOUT).decode('utf-8')
# Execute shell commands (pipes, redirects, etc.), you can use shlex.quote() to double quote the arguments to cause
result = subprocess.check_output('grep python | wc > out', shell=True).decode('utf-8')
2.12 Do not repeat the wheel
Don’t build wheels repeatedly. Python is called batteries included, which means that Python provides solutions to many common problems.
3. Common tools
3.1 Reading and writing CSV files
import csv
# Read and write without header
with open(name, 'rt', encoding='utf-8', newline='') as f: # newline='' lets Python not handle line feeds uniformly
for row in csv.reader(f):
print(row[0], row[1]) # CSV reads all data as str
with open(name, mode='wt') as f:
f_csv = csv.writer(f)
f_csv.writerow(['symbol', 'change'])
# Read and write with header
with open(name, mode='rt', newline='') as f:
for row in csv.DictReader(f):
print(row['symbol'], row['change'])
with open(name, mode='wt') as f:
header = ['symbol', 'change']
f_csv = csv.DictWriter(f, header)
f_csv.writeheader()
f_csv.writerow({'symbol': xx, 'change': xx})
When csv file is too large, there will be an error. _csv.Error: field larger than field limit (131072)
, fix by changing the limit
import sys
csv.field_size_limit(sys.maxsize)
csv can also read data split by \t
f = csv.reader(f, delimiter='\t')
3.2 Iterator tools
A number of iterator tools are defined in itertools, such as the subsequence tool.
import itertools
itertools.islice(iterable, start=None, stop, step=None)
# islice('ABCDEF', 2, None) -> C, D, E, F
itertools.filterfalse(predicate, iterable) # Filter out elements whose predicate is False
# filterfalse(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6
itertools.takewhile(predicate, iterable) # stop iterating when predicate is False
# takewhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 1, 4
itertools.dropwhile(predicate, iterable) # start iterating when predicate is False
# dropwhile(lambda x: x < 5, [1, 4, 6, 4, 1]) -> 6, 4, 1
itertools.compress(iterable, selectors) # select based on whether each element of selectors is True or False
# compress('ABCDEF', [1, 0, 1, 0, 1, 1]) -> A, C, E, F
Sequence sorting.
sorted(iterable, key=None, reverse=False)
itertools.groupby(iterable, key=None) # group by value, iterable needs to be sorted first
# groupby(sorted([1, 4, 6, 4, 1])) -> (1, iter1), (4, iter4), (6, iter6)
itertools.permutations(iterable, r=None) # Arrange, return value is Tuple
# permutations('ABCD', 2) -> AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC
itertools.combinations(iterable, r=None) # Combinations, return value is Tuple
itertools.combinations_with_replacement(...)
# combinations('ABCD', 2) -> AB, AC, AD, BC, BD, CD
Multiple sequences are merged.
itertools.chain(*iterables) # Multiple sequences directly concatenated
# chain('ABC', 'DEF') -> A, B, C, D, E, F
import heapq
heapq.merge(*iterables, key=None, reverse=False) # Multiple sequences in order
# merge('ABF', 'CDE') -> A, B, C, D, E, F
zip(*iterables) # Stop when the shortest sequence is exhausted, the result can only be consumed once
itertools.zip_longest(*iterables, fillvalue=None) # Stop when the longest sequence is exhausted, the result can only be consumed once
3.3 Counters
A counter counts the number of occurrences of each element in an iterable object.
import collections
# Create
collections.Counter(iterable)
# frequency
collections.Counter[key] # frequency of key occurrences
# return the n most frequent elements and their corresponding frequencies, if n is None, return all elements
collections.Counter.most_common(n=None)
# Insert/Update
collections.Counter.update(iterable)
counter1 + counter2; counter1 - counter2 # counter plus or minus
# Check if two strings have the same constituent elements
collections.Counter(list1) == collections.Counter(list2)
3.4 Dict with default values
When accessing a non-existent Key, defaultdict will set it to some default value.
import collections
collections.defaultdict(type) # When a dict[key] is accessed for the first time, type is called without arguments, providing an initial value for the dict[key].
3.5 Ordered Dict
import collections
OrderedDict(items=None) # Preserve the original insertion order when iterating
4. High Performance Programming and Debugging
4.1 Outputting error and warning messages
Outputting messages to standard errors
import sys
sys.stderr.write('')
Exporting warning messages
import warnings
warnings.warn(message, category=UserWarning)
# The values of category are DeprecationWarning, SyntaxWarning, RuntimeWarning, ResourceWarning, FutureWarning
Control the output of warning messages
$ python -W all # Output all warnings, equivalent to setting warnings.simplefilter('always')
$ python -W ignore # Ignore all warnings, equivalent to setting warnings.simplefilter('ignore')
$ python -W error # Convert all warnings to exceptions, equivalent to setting warnings.simplefilter('error')
4.2 Testing in code
Sometimes for debugging purposes, we want to add some code to our code, usually some print statements, which can be written as.
# in the debug part of the code
if __debug__:
pass
Once debugging is over, this part of the code will be ignored by executing the -O
option on the command line:
$ python -0 main.py
4.3 Code style checking
Using pylint
, you can perform a number of code style and syntax checks to catch errors before running
pylint main.py
4.4 Code consumption
Time consumption tests
$ python -m cProfile main.py
Test a block of code for time consumption
# block definition
from contextlib import contextmanager
from time import perf_counter
@contextmanager
def timeblock(label):
tic = perf_counter()
try:
yield
finally:
toc = perf_counter()
print('%s : %s' % (label, toc - tic))
# Code block time consumption test
with timeblock('counting'):
pass
Some principles of code consumption optimization
- Focus on optimizing where performance bottlenecks occur, not on the entire code.
- Avoid using global variables. Local variables are faster to find than global variables, and running code with global variables defined in a function is typically 15-30% faster.
- Avoid using . to access properties. It is faster to use from module import name and to put the frequently accessed class member variable self.member into a local variable.
- Use built-in data structures as much as possible. str, list, set, dict, etc. are implemented in C and run quickly.
- Avoid creating unnecessary intermediate variables, and copy.deepcopy().
- String splicing, e.g.
a + ':' + b + ':' + c
creates a lot of useless intermediate variables,':'.join([a, b, c])
is much more efficient. Also consider whether string splicing is necessary; for example,print(':'.join([a, b, c]))
is less efficient thanprint(a, b, c, sep=':')
.
5. Other Python tricks
5.1 argmin and argmax
items = [2, 1, 3, 4]
argmin = min(range(len(items)), key=items.__getitem__)
argmax is the same.
5.2 Transposing two-dimensional lists
A = [['a11', 'a12'], ['a21', 'a22'], ['a31', 'a32']]
A_transpose = list(zip(*A)) # list of tuple
A_transpose = list(list(col) for col in zip(*A)) # list of list
5.3 Expanding a one-dimensional list into a two-dimensional list
A = [1, 2, 3, 4, 5, 6]
# Preferred.
list(zip(*[iter(A)] * 2))