Python requires no introduction to process excel, images, web page or any thing you point your fingers. There is a quick thing that you can build it fast.
When we write program in a linear fashion there will be a situation we might end up starring the console or UI for some results to come to an end.
What I mean is a python program runs on a single thread by default and in a single processor. It will consume only that much resource and work that fast only.
To take the advantage of hardware at hand, all the processor cores and memory we will have to use more tools in the python language.
Any program performance can be decided by demand on the IO and Processing Power required by our application.
Using the same factors we can pick a solution to our problem of improving the program performance. Or simple put to identify what tools to use in the language.
First let me define what is IO ?
IO is Input/Output, these are communications from program to file system, database, internet etc. The higher the volume more is the demand on IO. And at the same time a conventional program will have scenarios of idle time, waiting for the responses to come. Instead it can process the volume of requests still to be made.
The Processing Power is the volume of data computation that needs to be performed. To utilise the complete processor core and all the cores of the computer.
I will describe as in above image based on the demand for IO and Processing Power required what solutions can be used.
I will give some basic example with code, but it will help you if you have already used threading, multiprocessing, async and message queues as a standalone solutions.
I wanted to give a big picture to chose the solutions when required.
1)When your application has more IO demand than data processing, then you can go for multi-threading implementation. It can definitely be considered for improving an existing processing need as well.
2) If you have more IO demand than the need for processing power, you can also consider async operations instead of threading.
3)If you are in need of more processing power and may or may not require higher IO, then you should go for multiprocessing implementation and multiple thread inside each process.
4)If you are in need of higher IO and higher Processing power at the same time, then you should plan to use MessageQueue with Multiple workers.
Multi-threading in python
Below is an example were, you have a function called send_message. (Assume you are replying to your customers on your ecommerce store, and to appear at ease the automated messages are sent at 1 second interval). It’s just a dummy function so only print to the console. If I have many people to send to then if I do one by one and it can take upto ~12 seconds.
What if I can send to all the customers almost at the same time. There comes multi-threading to rescue. The program is modified as below and it can complete in 4 seconds. Remember sleep commands, Our send_message function will take a minimum of 3 second for 1 customer, so it’s good that for all 3 customers it took only 4 seconds. (Please don’t see the console output, I know its a mess. Just consider instead of printing, it is some API to where messages are sent. Then we are good with this same program).
Multi-processing in python
Multiprocessing tries to execute each created process in different cores. The assignment of core and the process is automatic based on availability of computing resource. The same above example is implemented in multiprocessing below. This will takes almost the same time, but you might see a slightly higher time taken of 4.2 seconds, because of the process creation overhead. If we were doing heavy calculations then this would have been better than multi-threading. Remember the chart explained above.
Async in python
Let me re-write the same example in terms of async operations. Async is more of the multi-threading category. All the async functions that we create will run in same process only. The difference here is the cpu time for instruction execution is handled automatically by the async methods. There are both advantages and disadvantages when it comes to ease of implementation. I’m not going into the details in this article. When it comes to measurement of IO performance especially situations like requesting thousands of web API then this is slightly better in terms of performance compared to multi-threading.
This execution will take almost the same time as multi-threading ~ 4 seconds.
Message-Queue & Workers in python
When you keep creating more number of threads and processes the computer resources will starve and there will be a traffic jam. To handle loads of process, to act as buffer and ease execution as suitable for resources, a message queue structure comes to rescue. All the functions that we wish to execute can be put into the queue, and we can have a function that reads from the queue and executes in different cores or a single core (Depending on how many workers we start running)
Assume if we have to send 1000 messages per second from our program and have to do some analytics as well, then the need for this program will make sense. I will translate the same above program to execute using the message queue.
We will need the following,
1)A message queue application, I will use redis database and its message queue.
Install redis software and start the server using: redis-server from the terminal.
2)A client program which will add jobs to the queue. The actual task function should be in a separate module, not to be present in main file.
3)A server program which will read from the redis queue and perform the job. This will be called as worker. This can be started as many instance as we need and computer has required computing resource as well.
It is also possible the worker program can be in different computers, still it can do the job.
The jobs results (return value of function) will be stored in the redis database. We can read it from redis on the client side. I haven’t added that code above, but be aware it is possible. The time to completion will be almost same as multiprocessing.
Redis is robust, popular and resources to develop are widely available. One downside is it is not officially supported on windows.
I fear I do not have a scale of measure for comparing IO vs Processing Power, but the examples should have given a fair idea.
Find all the code examples here in this repo.
If you have any other thoughts or different approach, do share with me.
Thanks for reading this far!
(This article was originally published in my linkedin newsletter “Tech Square”)