Tuesday, 13 June 2017

Using Git as a Local Repository for Your Existing Projects

Git is a version control software that helps you to tracking the different versions of your projects.

Git Installation For Windows

Regarding using git in Windows, I found the following post to be useful.

For my Win7 PC, I chosen to use Cmder. 

Creating a new Git repository from an existing project

Say you’ve got an existing project that you want to start tracking with git. The first step is to open a Bash console for the directory containing the project. 

In my case, since I am using Cmder, I can do it through its integration with Windows File Explorer.

Then in the console:

  • Type git init
    • This create a hidden folder for git administration (called ".git").
  • Add ignore files into the global ignore file (called ".git\info\exclude"). These are files that you don't want to add into the repository, e.g. "*.obj".
  • Type git add . 
    • Note that you have to type a period at the end. The use of the period means: add all relevant files from the current folder into git's staging area.
  • Type git commit -m "Initial version"
    • This commits all the files in the staging area as a single version. The -m flag allows you to give a meaningful message to this commit.
Note: if you get error about unable to detect email or name, just type following commands:

 git config --global user.email "yourEmail"
 git config --global user.name "yourName"

Git GUI Client

To easily manage the versions inside the repository, I have used SmartGit as a GUI client. It is also integrated with Windows File Explorer.

You can use SmartGit to view the repository created in the earlier section. It is really easy to use. You can use it to commit new versions into the repository. Remember that you can add messages to describe each new version.

Git Hub

For more information about connecting to GitHub to download projects, you can refer to http://kbroman.org/github_tutorial/pages/init.html

Other References

https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository


Thursday, 23 February 2017

Memory Leak Detection using Redgate ANTS Memory Profiler

For High-Availability (HA) software applications, it is important to ensure that there are no memory leaks. I would like to share my experiences with an useful memory leak detection tool for C#. This is the ANTS Memory Profiler from Redgate. This tool is able to help you track down the source of the leaks.

To explain how the tool detects leaks, I am using our C# application that has a subtle memory leak. 
It has a Master object which communicates with his Worker objects using an event. We know there is a leak because after every time we create a Worker object and destroy it again, our memory usage increases.

Code Listing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
public class Master
{
    public event EventHandler Report;
    private void RaiseEvent()
    {
        EventHandler handler = Report; //publisher-subscriber idiom
        if (handler != null)
        {
            handler(this, new EventArgs());
        }
    }

}
public class Worker
{
    private byte[] data;

    public Worker(Master p)
    {
        data = new byte[10000];
        p.Report += OnReport;
    }
    private void OnReport(object sender, EventArgs e) //event handler
    {//report status
    }
}
class Program
{
    private static Master p = new Master();
    public static void Main(string[] args)
    {
        Console.WriteLine("Do first ANTS snapshot here, then press any key to continue...");
        Console.ReadKey(true);
        for (int i = 0; i <= 1000; i++)
        {
            Worker w = new Worker(p);
            UseThenDispose(w);
        }
        Console.WriteLine("Do second ANTS snapshot here, then press any key to continue...");
        Console.ReadKey(true);

    }
    private static void UseThenDispose(Worker w)
    {
        Console.WriteLine("Worker doing some work ...");
        w = null; //free the memory
        Console.WriteLine("Worker's resources are disposed.");
    }
}

Starting ANTS

When you open the profiler (choosing a "New Profiling Session"), you see the Startup screen:

All we need to do is point it at our C# application (Worker.exe), and click "Start profiling".
The profiler starts up Worker.exe and begins collecting performance counter data:

Taking Memory Snapshots

Taking and comparing memory snapshots is a key activity when looking for memory leaks, so our approach will be as follows:

1. Wait for memory usage to stabilize, then take the first snapshot by clicking on the Take Memory Snapshot button near the top right hand corner. This first snapshot will be used as a baseline.

When we click the Take Memory Snapshot button, the memory profiler forces a full garbage collection and takes a snapshot of the memory it is using (as seen below).


In our code, the memory usage stabilizes at line 32. For ease of understanding, at this juncture the code writes a message to the console window.

2. Then perform the actions that we think cause the memory leak.
In our code, memory leaks have occurred by the time when we arrive at line 39. Again for ease of understanding, at this juncture the code writes a message to the console window.
If you take a careful look at the memory graph below, you can see a faint green line, which shows that the Private Bytes memory usage has increased, so take a second snapshot. The next section explains what is meant by Private Bytes.
ANTS is able to plot different types of memory usage simultaneously. Examples of the types of memory usages seen in the diagram above are as follows
  • Private Bytes: Includes memory allocated (even if not in use), but excludes shared processes (e.g. DLL in memory and .NET run-time).
  • Working Set: Includes shared processes as well. 

3. Examine the comparison that the profiler shows us after it has finished taking and analyzing the second snapshot. This will be further explained in the next section.

Comparison of Memory Usage


The ANTS summary screen above shows a lot of information which I will explain one by one. You can follow along by looking at the octagon-shaped labels inside the summary screen.

  1. We can see a large memory increase between snapshots: 9.7 MB vs 52 kB.
  2. The largest classes are shown to us in the bottom right of the screen: Byte[] and EventHandler.
  3. Next, we switch to the Class List to find out more. The class list gives us a fuller picture of what's in the snapshot.

Class List


We're interested in objects which have been created since the baseline snapshot, so we can look at classes which have increased in size in the second snapshot. We therefore sort by Size Diff in decreasing order.

The Byte[] class has been placed at the top of the list, with an increase of over 10 million bytes. We want to understand why there is such a large increase, so we load the Instance Categorizer for the Byte[] class by clicking the Instance retention graph iconicon.

Instance Categorizer



We can now see the source of the leaks:
  1. The Byte[] arrays belong to the Worker objects
  2. The Worker objects were not disposed at all because the EventHandler in the Master object was still holding on to them. This is found in line 21 of the code, where the Worker objects are still subscribed to the Master object's event. 
Having discovered the source, it is now easy to remove the leak. You just have to unsubscribe from the event at the time of disposal of each Worker. 

Summary

There are many other features inside ANTS which I did not have to use today because I have already found the source of the leaks. If you like, you can try out ANTS yourself. There is a 2-week trial version at Redgate's website.

All in all, I like the ANTS tool because it has helped me to solve hard-to-understand leaks in the C# projects that I was involved in. It is good for both C# executables and DLLs. I have also used it successfully with managed C++, since the tool supports the .NET development environment.

Sunday, 5 February 2017

Memory Leaks in C#

Some people wrongly assume that there are no memory leaks for C# applications since .NET has an automatic garbage collector. But there is an easy way to understand how memory leaks can still occur.

How? Just consider the rate of change in memory over time for a C# application. As long as the rate of memory allocation is faster than the rate of memory recovery (which is what the garbage collector does), all our available memory will eventually be allocated.

As proof, look at the following memory profile of a C# application that is leaking memory at a rate of 100K bytes per second.



Code Listing

Here is the code with the leak:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
    class Program
    {
   
        private static Queue queue = new Queue();
        static void Main(string[] args)
        {
            int factor = 100;
            int i=0;
            do
            {
                Thread.Sleep(1000);
                var v = new byte[factor * 1024];
                queue.Enqueue(v);
                v = null;
                Console.WriteLine("i={0}", ++i);
            } while (true);
         
        }
    }


Notice line 14 where the programmer wanted to let the Garbage Collector (GC) know that it is alright to reclaim the object by setting its value to null. But the GC will not be able to do so because there is still another reference to the object inside the queue container.

When an container holds references longer than necessary, memory consumption will increase and (in this example) eventually results in an OutOfMemoryException.

In a future post, I hope to be able to show a more complicated memory leak that occurs with event handlers.

Friday, 20 January 2017

Python Interpreter: Writing functions

In my previous post, I shared about how to get help from the Python interpreter.

Today, let's learn more about the Python interpreter by writing functions.

We are going to write a Python function to run a periodic task.

First, our periodic task is a function to print the current time:

def printTime():
  print(time.asctime())


Continuation Lines

You may have noticed that the printTime() function consisted of 2 lines, and may ask how to input the second line into the interpreter.

The answer is that the interpreter knows that you are going to need a second line of input when it sees the colon sign(":"). So the interpreter will automatically give a "continuation line prompt" (shown by the 3 dots "..."). In the continuation line, you can key-in " print(time.asctime())". 

When you are done and you don't want any more continuation lines, just hit "Return" until you get back the normal Python prompt as shown below. 

>>> def printTime():
...   print(time.asctime())
...

Continuation lines work for if-then-else and other Python control flow structures as well.
Once in a while, you may need a continuation line outside of functions/control flow. For these situations, just key-in the backslash character ‘\’ to get the continuation line prompt.

Next, how do we schedule a periodic task?
Googling lead me to the following code that I am not familiar with:

1
2
3
4
5
6
7
8
import sched, time
s = sched.scheduler(time.time, time.sleep)

def periodic(scheduler, interval, action, actionargs=()):
  scheduler.enter(interval, 1, periodic, (scheduler, interval, action, actionargs))
  action(*actionargs)

periodic(s, 5, printTime)

In particular, I do not know what is the meaning of s object at line 2.
Furthermore, when I run the code, the time was printed only once, instead of periodically with an interval of 5 seconds (as expected from line 8).
To solve this, I will use the interpreter as shown in the next section.

Finding Out The Type

Here is what I will key-in to the interpreter:

>>> print(type(s))
<class 'sched.scheduler'>

The interpreter reveals that it is a sched.scheduler object.

Reading the help for the scheduler class led me to a method called "run()". Will this method kick start the periodic tasks?

We can find out quickly with the interpreter by entering the following line
>>> s.run()


It works!

As you can see, the interpreter allowed me to experiment and find out how to kick-start the scheduler. This is certainly more flexible and faster than the "edit-compile-test" cycle, which I had discussed in an earlier post.

Explanation of Scheduler Object

For those who would like to understand how a scheduler is used to implement a periodic task, the trick is to have the task re-schedule itself at the specified interval. This is done at line 5.

Tuesday, 17 January 2017

Python Interpreter: Help System

Last time, I talked about using the Python interpreter to calculate the cosine rule. Today, I intend to cover the interpreter in more detail.

One of the benefits of the interpreter is that it helps you to learn how to use an unfamiliar Python function. For instance, let's say I have forgotten whether the math.cos() function works in degrees or radians. What can I do?

Of course, one way is to look up the Python help documentation. But there is another convenient way, which is to type the following command in the interpreter:
>>> help(math.cos)
cos(x)

Return the cosine of x (measured in radians).

Straight away, I get the answer.

Note that you should import the math module for today's examples, as follows:
>>> import math 

Object Dir

Next, suppose I want to find out what is the function that computes the natural logarithm. I make a guess that the math module should have such a function. So, I can type the following command to check out what functions are inside the math module:
>>> dir(math)
['__doc__', '__loader__', '__name__', '__package__', '__spec__'
, 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh'
, 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc'
, 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum'
, 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan'
, 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi'
, 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']


The result shows 4 possible logarithm functions. So I try the first one as follows:
>>> help(math.log)
log(x[, base])

Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.

Jackpot! It is easy, right?

Another way to do this is to get the help for the entire math module as follows:
>>> help(math)

This will get you the "docstrings" (explained below) of all the functions inside the math module.

Python Docstrings

The reason why the help() function works is because in Python, every module/function has a member called __doc__, which typically contains a brief description of the function.

This is also known as the "docstring", which what is printed out by the help() function.

I can also print out the "docstring" on my own with the following command:
>>> print(math.cos.__doc__)

Python __Dict__

Objects in Python have a special attribute named "__dict__". This stores the object's attributes. 
Since classes are also objects, therefore x.name is equivalent to trying the following in order: x.__dict__['name'], type(x).name.__get__(x, type(x)), type(x).name.

In a future post, I would like to discuss how to write functions using the interpreter.

Friday, 13 January 2017

Python Interpreter: How to use

One of the most useful features in Python is its interpreter. In my view, it is what makes a Python so special as a language. With the Python interpreter, you can gain so much productivity as compared to using other languages such as C++ or C#.

For instance, if you are required to compute the following mathematically formula:


You would prefer to use a calculator rather than use C++, right?

If you really want to use C++, there are (seriously) quite a lot of steps needed such as:

  1. Create a new Visual Studio solution
  2. Create a C++ project using the Wizard
  3. Open the editor to start coding.
    • Include the correct header files (e.g. <cmath>, <iostream>)
    • Code the formula in C++ 
    • Print out the final result using std::cout, std::newline, etc
  4. Compile the codes
  5. If the compile fails, try to figure out where is the syntax error. Repeat Step 3.
  6. Run the executable
  7. If the final result is wrong, Repeat Step 3.

This is commonly known as the "edit-compile-run" cycle and it can takes quite a lot of effort compared to using a calculator.

How about Python? Well, you can actually think of the Python interpreter as a super-calculator. All you need to do is to type the following lines inside the interpreter:

>>> from math import cos, sqrt
>>> a=1
>>> b=2
>>> C=0.7
>>> print('Answer: ', sqrt(a*a+b*b-2*a*b*cos(C)))

Answer:  1.3930654151410284

This gives you the answer quickly.

What if you typed the wrong value for the variable b?

No sweat. Just type the correct value inside the interpreter and the old value will be changed instantly. There is no need to open up the editor at all.

Note that you can recall the previous line(s) inside the interpreter by pressing the "up-arrow" key.

By breaking away from the "edit-compile-run" cycle, you are able to accomplish your tasks faster. This means you save time and become more productive.

In the event that you have to reuse your tested code, you can save it into a script. Then you can copy-and-paste from the script into the interpreter whenever you need to run it again.

In my future posts, I would like to show you how to get help from the interpreter and how to write functions using the interpreter. That is, writing functions without using an editor. The intention is to break away from the "edit-compile-run" cycle so that you get better productivity.

Wednesday, 4 January 2017

Secure Socket Layer (SSL) Introduction

Although its modern name is actually Transport Layer Security (TLS), it is more commonly known as Secure Socket Layer (SSL).

SSL is a cryptographic protocol that is widely used to secure communications on the Internet. It has 2 functions:
  1. Identification: making sure the computer you are talking to is the one you trust.
  2. Encryption: preventing eavesdropping of the information sent from one computer to another.

Identification

In today's Internet era, identification is crucial.

Why?

The nature of the Internet means that your information hops through several computers before it reaches the intended destination. Any of these computers could pretend to be your destination and trick you into sending them sensitive information.

The solution is to use a proper Public Key Infrastructure (PKI), which means that the destination computer has a SSL Certificate from a trusted SSL Certificate Authority (CA).

The SSL Certificate authenticates the identity of the destination computer. This is a prerequisite before encryption of information can take place.

Encryption

SSL provides data encryption.  Many network protocols were designed in the early days of Internet where there was no data encryption. Nowadays, data encryption is crucial and SSL is used to secure these protocols (e.g. HTTP -> HTTPS, SMTP -> SMTPS). 

If you see an URL starting with http, you know that your data is sent unencrypted. But if you see that it starts with https, you know that your data is encrypted using SSL.

There are 2 types of encryption: 

  1. Symmetric: when you and the destination computer use a common secret key for encryption/decryption.
  2. Asymmetric: when the key used to decrypt is different from the key used to encrypt. This is used in situations when there is no common secret key, e.g. Internet transaction with Amazon.
Asymmetric encryption is used by PKI because it allows complete strangers to start a secret communications channel. But since it is computationally slower than symmetric encryption, a switch to symmetric encryption will take place as soon as the communications channel is secured. In SSL, this happens after the SSL handshake, which we will discuss next.  

SSL Handshake

When an URL starts with https, your browser will ask the URL's server for its SSL certificate through a "Client Hello"(CH) message. This is the start of the SSL handshake. 

In the CH message, your browser provides a list of cryptographic algorithms that it supports. 

For example, 


The server will choose 1 algorithm inside each of the 3 categories and inform the client through a "Server Hello" (SH) message. This is the Key Method Exchange phase, where the objective is to agree on a set of cryptographic algorithms to use for this secured communication channel/session. 

Let's assume the server chooses the RSA-AES256-SHA512 combination. 

In addition, the SH message contains the server's SSL certificate, which allow contains the server's public key. Your browser then uses the SSL certificate to verify the server's identify. 

Once the server's identity is ascertained, your browser will use the server's public key to encrypt a common secret (called the "pre-master secret") and send it to the server. The server uses its private key to decrypt and obtain the "pre-master secret". This message is called the "Client Key Exchange" (CKE). (The "pre-master secret" is actually a random value generated by your browser and is used to derive symmetric encryption keys that will be faster than the asymmetric encryption that is presently used).  

Following the CKE message, your browser sends out a message called a “Change Cipher Spec” (CCS). The CCS informs the server that it is entering the phase of "Symmetric Cryptography". The server will prepare for this by using the "pre-master secret" to generate the "master secret", which both sides know. The "master secret"is used to compute the keys for encryption and hashing.

Your browser goes through the same process of computing the keys for encryption and hashing. Once the keys are ready, it sends the "Finished message" (FM), whose integrity is protected by using the newly computed hash key. This is a clever message that proves that no one tampered with the handshake and it proves that we know the key. The FM is the last message your browser sends in the SSL Handshake.


Side note: There is also a option for the server to ask the client for its SSL certificate if it so desires. This option is found inside the SH message.

Reference: 

The First Few Milliseconds of an HTTPS Connection
http://www.moserware.com/2009/06/first-few-milliseconds-of-https.html