Coding¶
General¶
- Software developers who train for interviews use Hackerrank but it has nice problems if you just want to improve your computational thinking and coding skills in general
- The Common-Sense Guide to Data Structures and Algorithms: Level Up Your Core Programming Skills, written by Jay Wengrow, is a fast and easy read to get an introductions to datastructures, time complexity and some important sorting and searching algorithms
- for filemagement I find it useful to sometimes use
- tree
- and for copying files etc. midnightcommander
- Visualization of latency numbers every progammer should now (why you want to be in L1 Cache, also nice is the huminized version with SSD random read as a “normal weekend” and L1 Cache as hearbeat )
- Bloom filters are interesting data structures for constant time lookup in a compact way, i found this blog article instructive: https://prakhar.me/articles/ bloom-filters-for-dummies/
- For SQL design and code the WWW SQL Designer tool is pretty useful
- Best Practices for Scientific Computing is a good intro in software development for scientists
Python¶
- Why scientists should learn to program in Python is a neat introduction to python for natural scientists
Juypter noteboos tricks¶
- magics are great
% env
to list enviornmental variables!
to run shell commands and% lsmagic
to get a list with all of them. You can even do profiling. Another nice one is :code:`% pastebin ` with which you can select linenumbers which you want to paste to pastebin - ipython widgets can be nice to make simple interactive plots (i.e. for education purposes). The dominadatalab blog has a nice overview and and interactive ping plot
- numpy-html is quite need to render numpy arrays in notebooks.
Jupyter notebooks on HPC environments¶
Using configuration file¶
In some cases it can be useful to, instead of running the ipython kernel on the headnode, to just submit it to computing node of cluster and then access the kernel from the browser (either from the headnode or your local machine).
You can achieve this by setting some things in your ~/.jupyter/jupyter_notebook_config.py
file you can
create this file using jupyter notebook --generate-config
.
In this file you then can modify the following:
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = XXXXX
Where XXXX
is a number greater than 8888. You might also want to set a password instead of a token
(you can do this in the config file or by running jupyter-notebook password
).
In your submission script you can then add hostname
to print the hostname which you can then use to access
the notebook at hostname:XXXXX
. In some cases you might also want to hardcode c.NotebookApp.ip
to
the ip of a particular compute node and then simply bookmark this address.
Using tunneling¶
Add the following lines to your submission script (in this case setting a password is really useful)
unset XDG_RUNTIME_DIR
NODEIP=$(hostname -i)
NODEPORT=$(shuf -i 8888-9999 -n 1)
echo $NODEIP:$NODEPORT
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
shuf -i 8888-9999 -n 1
is just to get a random port (binding to 0 and letting the OS is chose is better
practice, but this here is easier). You can then look into the output file which your
scheduler produced as it should contain NODEIP
and NODEPORT
which you can then use to
establish a tunnel connection from your local machine using
ssh -N -L 8888:$NODEIP:$NODEPORT <username>@<machinename>
on your local machine you can now access the jupyter server at http://localhost:8888
. If you have to use
windows on your local machine, you can set up tunneling using MobaXTerm. On some schedulers you may want to enable
direct writing of output files, on the most recent version of PBS Pro, this is possible with qsub -koed
.
Note that in both approaches the kernel will of course die after the walltime is exceeded.
Speaking about jupyter notebooks, I like the jupytertheme package.
Online¶
- Valuable and fun are always the talks by Raymond Hettinger
- Great infoŕmation is in The Hitchhiker’s Guide to Python
- Bernd Klein has also good information on advanced topics such as metaclasses or memoization with decorators
Faster Python¶
- Consider trying PyPY instead of CPython (check the benchmarks).
- Nice introduction in vectorization, numpy and numba (and what are the bottlenecks in python) by Donald Whyte
C¶
Online¶
- Build your own lisp is a nice way to get started with C and learn about lisps
Editors/IDEs¶
VIM¶
Vim is a really powerful editor, but you need to spend some time learning and configuring it.
Configuration¶
Some useful settings for the .vimrc
file are:
syntax highlighting
syntax enable
is probably self-explanatorysearch
set incsearch " lookahead search set ignorecase " in most cases I want to be case-insenstivie set smartcase " unless i explicitely use uppercase set hlsearch " highlight matches
identations
set tabstop=4 " number of spaces per <TAB> set expandtab " convert <TAB> key-presses to spaces in insert mode set shiftwidth=4 " set a <TAB> key-press equal to 4 spaces set autoindent " copy indent from current line when starting a new line set smartindent " even better autoindent ('smart' insert after e.g. {)
Persistent undo
if has('persistent_undo') " Save all undo files in a single location (less messy, more risky)... set undodir=$HOME/.VIM_UNDO_FILES " Save a lot of back-history... set undolevels=5000 " Actually switch on persistent undo set undofile endif
I am paranoid, I want to lose at max 10 keystrokes
set updatecount=10
If you do not want to type all the search replace syntax (vide infra) remap it
nmap S :%s//g<LEFT><LEFT>
now you need to type only
SX/Y<CR>
for global search/replace on all lines.
If you want to see a really crazy setup, check out Damian Conway’s vim setup. There you can also find how to create the Star Wars intro in vim.
Plugins¶
Commands¶
Use
$
to get to the end of the linesUse different navigation levels
b
,w
,{
and(
Search/Replace (
g
means global)- all lines
:%s/foo/bar/g
- this line
:s/foo/bar/g
- all lines
PyCharm¶
PyCharm is the IDE I use for larger python projects, some useful features are:
Sublime¶
Sublime is a lot faster than PyCharm and supports basically all languages. For setting it up, the realpython blog has some useful package recommendation (especially the package manager is really good). In addition to that I would recommend PyYapf and the Flake8 linter
Development process¶
Starting a project¶
The easiest way to start a (python) project is to use a cookiecutter that creates the basic project structure and also some configuration files for you. A nice one in the field of molecular simulations is the cookiecutter for computational molecular sciences python packages
CI/CD¶
Docker¶
On HPC environments, where you don’t have root rights, singularity might be a way to go. There is also a image to convert singularity images to docker images
Git(hub)¶
Pre-Commit¶
Documentation¶
- ReStructured Text Quickreference: useful when writing sphinx docs