Intermediate Python Intermediate Python

1. *args and **kwargs
1.1. Usage of *args
1.2. Usage of **kwargs
1.3. Using *args and **kwargs to call a function
1.4. When to use them?
2. Debugging
3. Generators
3.1. Iterable
3.2. Iterator
3.3. Iteration
3.4. Generators
4. Map & Filter
4.1. Map
4.2. Filter
5. set Data Structure
6. Ternary Operators
7. Decorators
7.1. Everything in python is an object:
7.2. Defining functions within functions:
7.3. Returning functions from within functions:
7.4. Giving a function as an argument to another function:
7.5. Writing your first decorator:
8. Global & Return
8.1. Multiple return values
9. Mutation
10. __slots__ Magic
11. Virtual Environment
12. Collections
12.1. defaultdict
12.2. counter
12.3. deque
12.4. namedtuple
12.5. enum.Enum (Python 3.4+)
13. Enumerate
14. Object introspection
14.1. dir
14.2. type and id
14.3. inspect module
15. Comprehensions
15.1. list comprehensions
15.2. dict comprehensions
15.3. set comprehensions
16. Exceptions
16.1. Handling multiple exceptions:
17. Lambdas
18. One-Liners
19. For – Else
19.1. else clause:
20. Open function
21. Targeting Python 2+3
22. Coroutines
23. Function caching
23.1. Python 3.2+
23.2. Python 2+
24. Context managers
24.1. Implementing Context Manager as a Class:
24.2. Handling exceptions
24.3. Implementing a Context Manager as a Generator Storing large binary files in git repositories Storing large binary files in git repositories

Git-annex works by storing the contents of files being tracked by it to separate location. What is stored into the repository, is a symlink to the to the key under the separate location. In order to share the large binary files between a team for example the tracked files need to be stored to a different backend. At the time of writing (23rd of July 2015): S3 (Amazon S3, and other compatible services), Amazon Glacier, bup, ddar, gcrypt, directory, rsync, webdav, tahoe, web, bittorrent, xmpp backends were available. Ability to store contents in a remote of your own devising via hooks is also supported.

Git-annex uses separate commands for checking out and committing files, which makes its learning curve bit steeper than other alternatives that rely on filters. Git-annex has been written in haskell, and the majority of it is licensed under the GPL, version 3 or higher. Because git-annex uses symlinks, Windows users are forced to use a special direct mode that makes usage more unintuitive.

Latest version of git-annex at the time of writing is 5.20150710, released on 10th of July 2015, and the earliest article I found from their website was dated 2010. Both facts would state that the project is quite mature.

Git Large File Storage (Git LFS)
The core Git LFS idea is that instead of writing large blobs to a Git repository, only a pointer file is written. The blobs are written to a separate server using the Git LFS HTTP API. The API endpoint can be configured based on the remote which allows multiple Git LFS servers to be used. Git LFS requires a specific server implementation to communicate with. An open source reference server implementation as well as at least another server implementation available. The storage can be offloaded by the Git LFS server to cloud services such as S3 or pretty much anything else if you implement the server yourself.

Git LFS uses filter based approach meaning that you only need to specify the tracked files with one command, and it handles rest of invisibly. Good side about this approach is the ease of use, however there is currently a performance penalty because of how Git works internally. Git LFS is licensed under MIT license and is written in Go and the binaries are available for Mac, FreeBSD, Linux, Windows. The version of Git LFS is 0.5.2 at the time of writing, which suggests it’s still in quite early shape, however at the time of writing there were 36 contributors to the project. However as the version number is still below 1, changes to APIs for example can be expected.

git-bigfiles – Git for big files
The goals of git-bigfiles are pretty noble, making life bearable for people using Git on projects hosting very large files and merging back as many changes as possible into upstream Git once they’re of acceptable quality. Git-bigfiles is a fork of Git, however the project seems to be dead for some time. Git-bigfiles is is developed using the same technology stack as Git and is licensed with GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2).

git-fat works in similar manner as git lfs. Large files can be tracked using filters in .gitattributes file. The large files are stored to any remote that can be connected through rsync. Git-fat is licensed under BSD 2 license. Git-fat is developed in Python which creates more dependencies for Windows users to install. However the installation itself is straightforward with pip. At the time of writing git-fat has 13 contributors and latest commit was made on 25th of March 2015.

Licensed under MIT license and supporting similar workflow as the above mentioned alternatives git lfs and git-fat, git media is probably the oldest of the solutions available. Git-media uses the similar filter approach and it supports Amazon’s S3, local filesystem path, SCP, atmos and WebDAV as backend for storing large files. Git-media is written in Ruby which makes installation on Windows not so straightforward. The project has 9 contributors in GitHub, but latest activity was nearly a year ago at the time of writing.

Git-bigstore was initially implemented as an alternative to git-media. It works similarly as the others above by storing a filter property to .gitattributes for certain type of files. It supports Amazon S3, Google Cloud Storage, or Rackspace Cloud account as backends for storing binary files. git-bigstore claims to improve the stability when collaborating between multiple people. Git-bigstore is licensed under Apache 2.0 license. As git-bigstore does not use symlinks, it should be more compatible with Windows. Git-bigstore is written in Python and requires Python 2.7+ which means Windows users might need an extra step during installation. Latest commit to the project’s GitHub repository at the time of writing was made on April 20th, 2015 and there is one contributor in the project.

Git-sym is the newest player in the field, offering an alternative to how large files are stored and linked in git-lfs, git-annex, git-fat and git-media. Instead of calculating the checksums of the tracked large files, git-sym relies on URIs. As opposed to its rivals that store also the checksum, git-sym only stores the symlinks in the git repository. The benefits of git-sym are thus performance as well as ability to symlink whole directories. Because of its nature, the main downfall is that it does not guarantee data integrity. Git-sym is used using separate commands. Git-sym also requires Ruby which makes it more tedious to install on Windows. The project has one contributor according to its project home page. Python Flux Reconstruction Python Flux Reconstruction

PyFR is an open-source Python based framework for solving advection-diffusion type problems on streaming architectures using the Flux Reconstruction approach of Huynh. The framework is designed to solve a range of governing systems on mixed unstructured grids containing various element types. It is also designed to target a range of hardware platforms via use of an in-built domain specific language derived from the Mako templating engine. The current release (PyFR 1.0.0) has the following capabilities:

Governing Equations – Euler, Navier Stokes
Dimensionality – 2D, 3D
Element Types – Triangles, Quadrilaterals, Hexahedra, Prisms, Tetrahedra, Pyramids
Platforms – CPU Clusters, Nvidia GPU Clusters, AMD GPU Clusters
Spatial Discretisation – High-Order Flux Reconstruction
Temporal Discretisation – Explicit Runge-Kutta
Precision – Single, Double
Mesh Files Imported – Gmsh (.msh)
Solution Files Exported – Unstructured VTK (.vtu, .pvtu)

PyFR is being developed in the Vincent Lab, Department of Aeronautics, Imperial College London, UK.

Development of PyFR is supported by the Engineering and Physical Sciences Research Council, Innovate UK, the European Commission, BAE Systems, and Airbus. We are also grateful for hardware donations from Nvidia, Intel, and AMD.

PyFR 1.0.0 has a hard dependency on Python 3.3+ and the following Python packages:

h5py >= 2.5
mako >= 1.0.0
mpi4py >= 1.3
mpmath >= 0.18
numpy >= 1.8
pytools >= 2014.3
Note that due to a bug in numpy PyFR is not compatible with 32-bit Python distributions.

CUDA Backend
The CUDA backend targets NVIDIA GPUs with a compute capability of 2.0 or greater. The backend requires:

CUDA >= 4.2
pycuda >= 2011.2
OpenCL Backend
The OpenCL backend targets a range of accelerators including GPUs from AMD and NVIDIA. The backend requires:

pyopencl >= 2013.2
OpenMP Backend
The OpenMP backend targets multi-core CPUs. The backend requires:

GCC >= 4.7
A BLAS library compiled as a shared library (e.g. OpenBLAS)
Running in Parallel
To partition meshes for running in parallel it is also necessary to have one of the following partitioners installed:

metis >= 5.0
scotch >= 6.0



Building blocks with a wide range of modules
The well matched Tinkerforge modules allow experienced programmers to concentrate on the software, thus projects can be completed faster. A programming novice on the other hand has the possibility to learn programming with exciting applications by using the Tinkerforge building blocks.

No detailed knowledge in electronics necessary
The realization of a project with Tinkerforge is possible without troubles. You simply pick the required modules and connect them together with each other. There is no other electronics knowledge and no soldering needed.
For example: If the project is to control a motor dependent on a measured temperature, you just have to choose a temperature sensor and an appropriate motor controller out of the available Tinkerforge building blocks.

Intuitive API
The Tinkerforge API offers intuitive functions, that simplify the programming. For example: It is possible to set the velocity of a motor in meters per second with a call of setVelocity() or to read out a temperature in degree Celsius (°C) with getTemperature(). Install Memcached Server For Python and PHP Apps Install Memcached Server For Python and PHP Apps

Memcached is a general-purpose distributed memory caching system. It is usually used to speed up dynamic database-driven webapps or websites by caching objects in RAM. It is often result into reducing database load. You need to install the following packages:

  1. memcached – A high-performance memory object caching server
  2. php5-memcache – Memcache extension module for PHP5
  3. php5-memcached – Memcached extension module for PHP5, uses libmemcached
  4. python-memcache – Pure python memcached client

Let us add a key called foo with value BAR, enter:

echo -e 'add foo 0 300 3\r\nBAR\r' | nc localhost 11211

Sample outputs:

STORED Proving that Android’s, Java’s and Python’s sorting algorithm is broken (and showing how to fix it) Python’s and Java’s Sort broken

Tim Peters developed the Timsort hybrid sorting algorithm in 2002. It is a clever combination of ideas from merge sort and insertion sort, and designed to perform well on real world data. TimSort was first developed for Python, but later ported to Java (where it appears as java.util.Collections.sort and java.util.Arrays.sort) by Joshua Bloch (the designer of Java Collections who also pointed out that most binary search algorithms were broken). TimSort is today used as the default sorting algorithm for Android SDK, Sun’s JDK and OpenJDK. Given the popularity of these platforms this means that the number of computers, cloud services and mobile phones that use TimSort for sorting is well into the billions.

Fast forward to 2015. After we had successfully verified Counting and Radix sort implementations in Java (J. Autom. Reasoning 53(2), 129-139) with a formal verification tool called KeY, we were looking for a new challenge. TimSort seemed to fit the bill, as it is rather complex and widely used. Unfortunately, we weren’t able to prove its correctness. A closer analysis showed that this was, quite simply, because TimSort was broken and our theoretical considerations finally led us to a path towards finding the bug (interestingly, that bug appears already in the Python implementation). This blog post shows how we did it. git-fat git-fat

Checking large binary files into a source repository (Git or otherwise) is a bad idea because repository size quickly becomes unreasonable.

  • clones of the source repository are small and fast because no binaries are transferred, yet fully functional with complete metadata and incremental retrieval (git clone –depth has limited granularity and couples metadata to content)
  • git-fat supports the same workflow for large binaries and traditionally versioned files, but internally manages the “fat” files separately
  • git-bisect works properly even when versions of the binary files change over time
  • selective control of which large files to pull into the local store
  • local fat object stores can be shared between multiple clones, even by different users
  • can easily support fat object stores distributed across multiple hosts
  • depends only on stock Python and rsync

Primiano Tucci: Large scale Git history rewrites

Primiano Tucci: Large scale Git history rewrites

I recently faced the challenging problem of rewriting the history of a very large project (170k commits, 2.8 million objects, 5GB compressed).

The rewrite process consisted of replacing binary files in the repo (.png and other extensions, circa 50k per revision) with textual URLs. git-filter-branch is the reference tool for doing these kinds of operations. Unfortunately, for various reasons explained in this post, git-filter-branch doesn’t scale (it would have taken ~months on a bulky machine). The nature of the problem makes it not (easily) suitable for tools like BFG, which is able to easily scale with simpler instances of the problem.

Use tmpfs as a scratch disk.

Commits rewritings are not parallelizable.

Trees rewritings are parallelizable.