Data Scientist is considered the sexiest job around the world, at least in the tech world, and it is true that sometimes the results that you can get applying Machine Learning techniques seem to be sorcery. Therefore a lot of people are trying to become data scientist.

A lot of them use Python and almost all the on-line courses use also Python to teach the concepts.



Python is a lovely programming language. It is so easy to get into and the final code is, sometimes, even beautiful. But all these things mean nothing when you face the GIL . There some solutions like multiprocessing . It is good when you can split the data in big chunks. Other approaches are related to apply horizontal scaling techniques in order to get a vertical scaling, what does it mean? If you have read about the GIL you already know that Python will only take advantage of one CPU core of your PC so, why don't we launch more Pythons and let's split the load between them using load balancing, messaging systems, and so on?
You can see that the first example is prepared to scale-in and the second one to scale-out. But all of them seem to be too much for a guy who only wants to develop a "Python code" so let's back to the data science world.


Typing an algorithm in Python is a pleasure and the ecosystem is great. Libraries like numpy or scipy will help you a lot. But you can also find higher-level libraries like scikit-learn or tensorflow . So, how am I not going to use Python for Machine Learning?

Some weeks ago I spend some time watching videos and presentations about Erlang and I remembered two things:
  • Francesco Cesarini, Founder & Technical Director of Erlang Solutions , defined Erlang as an orchestration language.
  • Demonware , the company behind the infrastructure for Call of Duty, uses Erlang to handle connections, tasks and especially, to control Python ( slides ).


Why don't I try to handle my Python code using Elixir? In this way I will be able to scale and specially to add fault tolerance.
With these things in mi mind I began to code Piton which is a library that uses Erlang Ports , thanks to ErlPort , to directly communicate Elixir and Python.

The first step is to have a Python project. I am going to use a simple example of a Fibonnaci calculator.



def fib(n):
   if n < 0: raise Exception("No negative values !!!")
   if n == 0: return 0
   if n < 3: return 1
   return fib(n - 1) + fib(n - 2)
Then, create the module of your own Port using Piton.Port:

defmodule MyPoolPort do
  use Piton.Port
 def start(), do: MyPoolPort.start([path: Path.expand("python_folder"), python: "python"], [])
 def fun(pid, n), do: MyPoolPort.execute(pid, :functions, :fun, [n])
end
It is mandatory to have a start() function which provides the path to the python project and the python interpreter which could belong to a virtual environment. Then you can define as many function as you need. I recommend to create some wrappers for the  execute() function which only needs the pid of the process which is connected to one Python, the atom of the python module, the atom of the python function and a list of arguments for the python function.

Now we only have to launch our Piton.Pool, indicating which module is going to use and the number Pythons we want to run, and use it:



iex> {:ok, pool} = Piton.Pool.start_link([module: MyPoolPort, pool_number: 2], [])
{:ok, #PID<0.176.0>}

iex> Piton.Pool.execute(pool, :fib, [20])
6765
Thanks to Elixir, although your python code raises exceptions, the Piton.Poll will always have the indicated number of Python interpreters ready.

You can check the github , the docs and the hex .

Python loves Elixir !!