INTRO TO PYTHON LECTURE OUTLINE BME205, Fall 2011 Contact: auzilov@ucsc.edu (0) Disclaimers and Background We will probably run out of time; can continue outside of class (TA office hours?). Why Python? - Easy for beginners, but many advanced features. - Consistent syntax (more or less). - Good for prototyping (fast to write). - Good for debugging (e.g. stack traces). - Powerful standard library: - unit tests (module "unitest") - profiler (module "profile") - regular expressions (module "re") - iterators (module "itertools") - tests of docstring-embedded examples (module "doctest") - and many more! - Widely adopted, including by bioinformatics community. - Thriving development community for language itself and associated tools. But remember: there is no "one true/best language"; you will have to learn others. Several Python implementations: CPython, Jython, IronPython, etc. (1) Resources for Python programming - docs.python.org - "index" is most useful - "Library Reference" is next in usefulness - These are the most relevant pages for BME205 Assignment #1: Tutorial docs: http://docs.python.org/tutorial/interpreter.html http://docs.python.org/tutorial/introduction.html http://docs.python.org/tutorial/controlflow.html http://docs.python.org/tutorial/inputoutput.html Understanding strings (and more generally, sequences): http://docs.python.org/library/stdtypes.html#string-methods http://docs.python.org/library/stdtypes.html#string-formatting-operations http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-buffer-xrange - Built-in functions: http://docs.python.org/library/functions.html - Python in a Nutshell book (useful for advanced users only) - Editors/IDEs - WingIDE has a free version: http://wingware.com/downloads/wingide-101/4.0.4-1/binaries - Eclipse with PyDev: http://pydev.org - I recommend emacs (Aquamacs on Mac OS X) - Use python-mode.el mode (NOT python.el): https://launchpad.net/python-mode - Auto-completion package available (looks in current buffer to append to current dictionary on-the-fly): http://cx4a.org/software/auto-complete/ or http://www.emacswiki.org/emacs/AutoComplete - recommended by John St. John: YaSnippet (http://code.google.com/p/yasnippet/); can be integrated with AutoComplete - epydoc for documentation (it's like javadoc). - Mac OS X only: when installing Python, I recommend using MacPorts instead of downloading the DMG from python.org . (2) Starting up Python - Python versions: use 2.7.x, do NOT use 3.x yet! - Python installation on SOE machines - see Evan the TA for this. - Command-line interpreter - useful for playing around with one line of code at a time - using it as a calculator - help() function, takes strings or references: help ('print') help (print) # same, as long as name is bound to a reference help ('modules') x = 205 help (x) s = 'some arbitrary string' help (s.rstrip) # help on object methods - Whitespace matters! But only in indentation. - History (readline support), reverse search. - Ctrl+D to exit. (3) Python types and data model - Everything is a reference; the object to which we refer has the type. - Binding: associating a (possibly named) reference to an object. Sometimes called "variable assignment", but in Python that's a misnomer. - Most important types: int, float, bool, string, list, tuple, dict, None - Categories of types mutable vs immutable sequence vs non-sequence - Type conversion. - Useful reference - standard type hierarchy: http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy - how to use standard types: http://docs.python.org/library/stdtypes.html (4) Writing your first Python program (demo0.py) - Shebang/hash-bang. - chmod +x to make program executable. - Two ways to run a Python program in a shell - Use shebang to locate interpreter (must have done chmod +x): ./progName.py - Ignore shebang, specify explicitly which Python interpreter to use (also ignores chmod +x setting): python progName.py /usr/local/bin/python2.7 progName.py (5) Other Python demo programs (demo*.py). See program code to learn. (6) Modules that I find most useful in bioinformatics work - Standard Python library modules argparse - parsing command-line arguments array - memory-efficient arrays itertools - iteration and combinatorics random - generate random numbers, shuffle lists re - regular expressions sys - system-level stuff If it is not obvious to you why they are useful, ask me. The awesomeness of some modules (e.g. "itertools") may not be obvious. - 3rd party modules - pysam for SAM/BAM files (sequencing reads): http://code.google.com/p/pysam/ - ruffus for pipelines (replaces makefiles): http://ruffus.org.uk/ - rpy2 for Python/R integration: http://rpy.sourceforge.net/rpy2.html (7) Etc. - Functional programming with Python: http://docs.python.org/howto/functional.html Has fun tidbits such as "how to multiply together all items in a list".