Saturday, August 27, 2016

Three Minutes Daily Vim Tip: Open Specific Location

Say you are compiling your code, and the compiler warns you with error in line 77 of your main.c file. You want to go to this specific line of this file. You could do this in two steps, i.e., open up the file and go to the line.

$ vim main.c
:77

Well, there is a a bit easier way to do this in one command:
$ vim +77 main.c

This command will open up main.c file at line 77.

Next, say you want to open up the file and go to your function main(). To do this, you would normally do
$ vim main.c
/main(

Again, there is a single command to do this:
$ vim +/main\( main.c

Note the presence of backslash \, which will tell the shell to input literal parenthesis ( character.

Happy Vimming!

Friday, August 26, 2016

Conditional Break with GDB

Say you want to setup a break point for given condition. To do this in gdb, simply enter the following command:
(gdb) break main.c:80 if x == 0

This will break only when the variable x is equal to 0 in the context. This will be very handy when debugging for certain conditions.

Monday, August 22, 2016

VirtualBox: Share Files between Mac OS X Host and Windows Guest

VirtualBox supports Shared Folders feature so that one can access files directionally between the host and guest systems. In this post, I will go over how to share files between VirtualBox Windows guess and Mac OS X host.




First, open up VirtualBox, and select your Windows virtual machine. Click on the Settings icon to open up the settings window, and click on Shared Folders tab. 




On the right hand side, click on a small button that reads Adds new shared folder. On the drop down menu for Folder Path, choose Other...




Finally, simply choose the folder in your host system (i.e., Mac OS X) which will be shared with the guest system (i.e., Windows). Check Auto-mount option.




Now, the setting is complete. Fire up your virtual machine, and you should be able to see your shared drive as a network drive when you open up My Computer window. The shared drive address should be \\vboxsrv

Sunday, August 21, 2016

Quick gdb Tip: Enhance Debugging Experience with TUI or CGDB

I have been using gdb for quite some time, more than years in fact. Let's see... I first learned gdb in my sophomore class in Computer Systems Engineering's course, back in 2007. Wow, it's been almost 10 years now.

And yet, here I am, learning new features every day. In fact, I learned one feature in gdb today that will change my development completely: text user interface.

Try it yourself, if you don't already know.
$ gdb a.out -tui -q

I wonder why my class in 2007 never taught this feature. Maybe this feature was non-existent then?

Another option is to use cgdb. You could download the latest from git directly and compile:
$ git clone git://github.com/cgdb/cgdb.git
$ cd cgdb
$ ./autogen.sh
$ ./configure --prefix=/usr/local
$ make
$ sudo make install
$ cgdb -q a.out

Thursday, August 18, 2016

Three Minutes Daily Vim Tip: Split Views

To open up a split view, type in
:sp file_to_open.c
:vsp file_to_open.c
The first will open up a horizontal split view, while the latter will open up a vertical one.

To move your cursor between views, type in
<Ctrl> [w] + [h]
<Ctrl> [w] + [j]
<Ctrl> [w] + [k]
<Ctrl> [w] + [l]
where h,j,k,l will move into the corresponding directions, i.e., left, down, up, and right.

To change size automatically, type in
<Ctrl> [w] + [=]

For manual size manipulation, type in
:resize 60
:vertical resize 60
where the first will move the horizontal divider, while the second will move the vertical divider.

To quit all views at once, type in
:wqa
:qa!
where the first will save and exit all views, while the latter will quit all views without saving.

Lastly, for easy navigation between split views, append the following into ~/.vimrc file:
nnoremap <C-J> <C-W><C-J>
nnoremap <C-K> <C-W><C-K>
nnoremap <C-L> <C-W><C-L>
nnoremap <C-H> <C-W><C-H>

You should now be able to navigate between split views with
<Ctrl> [h]
<Ctrl> [j]
<Ctrl> [k]
<Ctrl> [l]

Wednesday, August 17, 2016

Three Minutes Daily Vim Tip: Tabs

Vim is a controversial editor: some love it, while some hate it. Vim is indeed difficult at first, requiring very steep learning curve. In order to help people, including myself, learn Vim, I am going to post a series of short Vim tips. Today is the first each the series. If you are clueless with Vim, learn the very basics first.

Starting Vim 7.0, it supports tabs. In Vim's normal mode, type in
:tabe some_file_to_open.c

You will immediately notice that the file is opened up in a new tab. To switch between tabs, type in
:tabn
:tabp

I have to admit that tab switching command is too long. Well, it's Vim: we can always create a one-key shortcut.

$ echo "nnoremap <F2> :tabp<CR>" >> ~/.vimrc
$ echo "nnoremap <F3> :tabn<CR>" >> ~/.vimrc

The two lines will map <F2> and <F3> keys for switching to previous and next tab, respectively.

Tuesday, August 16, 2016

Some Useful Commands in GDB

I would like to list common useful commands in gdb:

(gdb) b main
set up a break point at main function after function prologue

(gdb) b *main
set up a break point right at the address of main, so that when $pc points to *main, it will break

(gdb) delete 1
delete break point 1

(gdb) p/d i
print the content of variable i as a signed decimal

(gdb) p/u j
print the content of variable j as an unsigned decimal

(gdb) p/x k
print the content of variable k in hex

(gdb) x/5i $pc
print next 5 instructions in assembly

(gdb) x/s buffer
print string at address pointed by variable buffer

(gdb) x/xg 0x123456789abcdef0
print 64-bit value in hex at address 0x123456789abcdef0

(gdb) x/10wx {void*}$rbp
print 10 consecutive 32-bit values in hex starting from the address pointed by $rbp register

(gdb) info reg
examine all the register values

(gdb) r arg1 arg2
run the program with command-line argument arg1 and arg2

(gdb) si
execute one machine instruction; if it is a function call, step into the subroutine

(gdb) ni
execute one machine instruction; if it is a function call, do not step into the subroutine

(gdb) step
execute one line of C/C++ code; step into function

(gdb) next
execute one line of C/C++ code; step over function

(gdb) p func(arg1, arg2)
print return value by calling function func with arguments arg1 and arg2

(gdb) finish
continue execution until the end of current frame (i.e., subroutine)

(gdb) disass main
show assembly instructions of main function

(gdb) list main.c:37
display source code of main.c file at around line 37

(gdb) display/i $pc
keep displaying next machine instruction

(gdb) until 37
continue execution until the specified line

Monday, August 15, 2016

Changing File and Directory Access Mode

In Unix-like systems, files and directories have access mode associated with them, which specifies who can do what with the file. The access modes include read(r), write(w), and execute(x) for its owner, group members, and others. Because there are three modes, there would be total 2*2*2 = 2^3 = 8 possible configurations of read, write, and execute flags. They are represented by one bit in a three-bit number as follows: read corresponds to 4 (=100b), write corresponds to 2 (=010b), and execute corresponds to 1 (=001b). Here, proceeding b indicates binary representation.

For example, if the file owner has all the read, write, and execute access, then it would be represented as rwx or 7 (=111b). Similarly, if only read and execute flags are set (enabled), then it would be represented as rx or 5 (=101b), and so on.

Because we have these 3-access-flags associated with its owner, group member, and others, we have total 3*3 = 9 digit binary-number to represent all possible 8*8*8 = 2^(3*3) = 2^9 = 512 configurations. For example, if the owner has read and write access, group member has write access, and others have execute access, then it would be 110 010 001b, where the leading three-bits represent owner access, the next three bits represent group member access, and the last three bits represent others access. Because it is so cumbersome to describe the access mode in terms of 9 bits, Unix represents them in octal representation, i.e., 110 010 001b = 621o, where proceeding o indicates octal representation.

This is how we can set the access mode using chmod. For example,
$ touch test
$ mkdir test_dir
$ ls -ld test test_dir
-rw-r--r--@ 1 linuxnme  staff   0 Aug 15 18:54 test
drwxr-xr-x@ 2 linuxnme  staff  68 Aug 15 18:57 test_dir/

The first dash means it is a regular file, For a directory, the first letter will be d, as can be seen with test_dir. The next three letters represent access mode of the owner, who in this case is linuxnme. Again, dash means no access, so test file has owner's read and write access only, while test_dir has owner's read, write, and execute access. For directories, execute access means search within the directory.

The next three letters represent access mode of the group member, in this case is staff. The last three letters represent access mode for others.

To change these access modes, one can use chmod commands:
$ chmod u+x test
$ ls -l test
-rwxr--r--@ 1 linuxnme  staff   0 Aug 15 18:54 test

$ chmod u-r test
$ ls -l test
--wxr--r--@ 1 linuxnme  staff   0 Aug 15 18:54 test

$ chmod u+rx-w test
$ ls -l test
-r--r--r--@ 1 linuxnme  staff   0 Aug 15 18:54 test

Here, u represents the owner or the user, linuxnme, and +/- sign indicates whether to enable or disable from the current access. For example, u+rx-w means to enable read and execute access but disable write access. In a similar manner, one can use g for group members and o for others. For example, we could do
$ chmod u-r+w,g+w,o-r+x test
$ ls -l test
--w-rw---x@ 1 linuxnme  staff   0 Aug 15 18:54 test

We can also use = sign to set it as is. For example,
$ chmod ug=rw,o=x test
$ ls -l test
-rw-rw---x@ 1 linuxnme  staff   0 Aug 15 18:54 test

However, it must be obvious that this method requires a bit of typing. Because programmers are lazy, chmod also accepts the octal representation mentioned above. That is,
$ chmod 744 test
$ ls -l test
-rwxr--r--@ 1 linuxnme  staff   0 Aug 15 18:54 test

This way makes it very easy, only requiring 3 digits to set access mode in any of the 512 configurations. 

Lastly, there are some other extra access modes, which are setuid, setgid, and sticky. Here, setuid access means that executing the file with setuid will run with effective uid of the owner of the file. For example, if the file is executable and owned by root, then running this file from any other user will have effective uid as root, which would be 0. setgid is similar, but pertains to the effective group id. Sticky bit is for shareable executable files and directories, but I won't go into that in this post.

In chmod, u+s will enable setuid and g+s will enable setgid. Also, one can append another octal digit to represent the configurations: 4 for setuid, 2 for setgid, and 1 for sticky. For example,
$ chmod 4755 test
$ ls -l test
-rwsr-xr-x@ 1 linuxnme  staff   0 Aug 15 18:54 test

Consider the code below:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main() {
  int uid = getuid();
  int euid = geteuid();
  printf ("uid: %d\neuid: %d\n", uid, euid);
  return 0;
}

If  you compile the code and run it, the program will output your uid and effective uid. In my case,
linuxnme $ ./a.out
uid: 501
euid: 501

However, if I enable setuid access and run it as a different user, I obtain effective uid equivalent to the owner of the file:
linuxnme $ chmod u+s a.out
linuxnme $ ls -l a.out
-rwsr-xr-x@ 1 linuxnme  staff  8528 Aug 15 18:24 a.out
linuxnme $ su someuser
Password:
someuser $ ./a.out
uid: 502
euid: 501

For more details, look up chmod manual by
$ man chmod

Sunday, August 14, 2016

Transition from C to Assembly: The Basics (x86 64bit)

In this post, I will go over how a simple C program translates to x86 64bit assembly.

Consider classical helloworld.c program below:


#include <stdio.h>
int main() {
  printf("hello world\n");
  return 0;
}

Let us compile it.
$ gcc -g helloworld.c

Now, we will output its assembly using gdb:
$ gdb a.out -q
(gdb) disass main
Dump of assembler code for function main:
   0x0000000100000f6b <+0>: push   rbp
   0x0000000100000f6c <+1>: mov    rbp,rsp
   0x0000000100000f6f <+4>: lea    rdi,[rip+0x2c]        # 0x100000fa2
   0x0000000100000f76 <+11>: call   0x100000f82
   0x0000000100000f7b <+16>: mov    eax,0x0
   0x0000000100000f80 <+21>: pop    rbp
   0x0000000100000f81 <+22>: ret
End of assembler dump.

Note that you may get different result, depending on your compiler and platform OS. I am compiling with gcc5.3.0 running on Mac OS X 64-bit. On Unix systems, the result should be very similar, although if you are running on 32-bit computer, you will be getting ebp, esp, etc instead of rbp, rsp, etc. Lastly, if you want to switch between Intel vs AT&T style assembly code, please refer to my previous post.

Let us go over each instruction step by step.
<+0> push rbp simply pushes the value of rbp, the frame pointer or base pointer onto the stack. rbp stores the base frame address for the function, and because main() is called, it is saving its previous base frame onto the stack, so that when main() returns, it can go back to the previous function. The previous function, of course, would be some system call that initiates the program.

<+1> mov rbp, rsp will move rsp into rbp. Now that rbp was saved from the previous instruction, we are now safe to store the current frame pointer into rbp. Note that rsp is the stack pointer, which stores the current address of the stack. The value of rsp will increment for each pop instruction and decrement for each push instruction---remember that stack grows from the high-address to low-address, so each push will decrement the address of the current stack pointer. After this instruction, rsp and rbp hold the same value, which is the base of the main() frame.

<+4> lea rdi, [rip+0x2c] will load the value 0x100000fa2 into rdi register. Note that this address contains what we want to print, "hello world". To see this, simply run
(gdb) x/s 0x0x100000fa2
0x100000fa2: "hello world"

<+11> call 0x100000f82 will call the appropriate system call to print out the string.

<+16> mov eax, 0x0 will move 0 to eax register, which will be return value from main().

<+21> pop rbp will pop the content of the memory address pointed by rsp into rbp. This content will simply be the previous value of rbp, pushed onto stack during instruction <+0>. Thus, it will restore the rbp value.

<+22> ret will finally return from the function by popping the value pointed by rsp into rip, which is the instruction pointer. Note that the caller who called main() has already stored this value onto the stack.

Transition from C to C++: Rule of Three (C++98)

In the previous post, we have looked at how we can declare constructor and destructor to take care of memory management behind the scene, so that the programmer does not need to worry about memory allocation when using local objects. However, the code has some significant mistakes that need to be fixed. Let's see what will happen in the following code.


#include <cstdio>
#include <cstring>
#include <vector>
#include <cstdlib>

class Word {
  char* word;
public:
  Word() {
    word = NULL;
  }
  ~Word() {
    if (word != NULL)
      free(word);
  }
  void setWord(char *word) {
    if (this->word != NULL) {
      this->word = (char*)realloc(this->word, strlen(word));
    }
    else {
      this->word = (char*)malloc(strlen(word));
    }
    strcpy(this->word, word);
  }
  char* getWord() {
    return word;
  }
};

int main (int argc, char **argv) {
  Word hello; // default constructor
  hello.setWord("hello");
  Word hi(hello); // implicit copy constructor

  return 0;
}

What possible errors will we encounter if we compile and execute this code? If you execute it, you will be surprised to see the output:

*** Error in `./a.out': double free or corruption (fasttop): 0x0000000001a5a010 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f9a8fa6f725]
/lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f9a8fa77f4a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f9a8fa7babc]
./a.out[0x400771]
./a.out[0x400717]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f9a8fa18830]
./a.out[0x4005e9]

As the error message states, the problem is that upon exiting main function, destructor for Word hello and Word hi is called. However, because the two objects are identical, due to calling the implicit copy constructor, their member word is pointing to the same location. In the destructor, free(word) is called, this is OK for the first time, but when it is called the second time, it crashes and provides the error message. 

It can be more clear by looking at the disassembly code from gdb.

Dump of assembler code for function main(int, char**):
   0x00000000004006b6 <+0>: push   rbp
   0x00000000004006b7 <+1>: mov    rbp,rsp
   0x00000000004006ba <+4>: push   rbx
   0x00000000004006bb <+5>: sub    rsp,0x38
   0x00000000004006bf <+9>: mov    DWORD PTR [rbp-0x34],edi
   0x00000000004006c2 <+12>: mov    QWORD PTR [rbp-0x40],rsi
   0x00000000004006c6 <+16>: mov    rax,QWORD PTR fs:0x28
   0x00000000004006cf <+25>: mov    QWORD PTR [rbp-0x18],rax
   0x00000000004006d3 <+29>: xor    eax,eax
   0x00000000004006d5 <+31>: lea    rax,[rbp-0x30]
   0x00000000004006d9 <+35>: mov    rdi,rax
   0x00000000004006dc <+38>: call   0x400734 <Word::Word()>
   0x00000000004006e1 <+43>: lea    rax,[rbp-0x30]
   0x00000000004006e5 <+47>: mov    esi,0x400884
   0x00000000004006ea <+52>: mov    rdi,rax
   0x00000000004006ed <+55>: call   0x400774 <Word::setWord(char*)>
   0x00000000004006f2 <+60>: mov    rax,QWORD PTR [rbp-0x30]
   0x00000000004006f6 <+64>: mov    QWORD PTR [rbp-0x20],rax
   0x00000000004006fa <+68>: mov    ebx,0x0
   0x00000000004006ff <+73>: lea    rax,[rbp-0x20]
   0x0000000000400703 <+77>: mov    rdi,rax
   0x0000000000400706 <+80>: call   0x40074a <Word::~Word()>
   0x000000000040070b <+85>: lea    rax,[rbp-0x30]
   0x000000000040070f <+89>: mov    rdi,rax
   0x0000000000400712 <+92>: call   0x40074a <Word::~Word()>
   0x0000000000400717 <+97>: mov    eax,ebx
   0x0000000000400719 <+99>: mov    rdx,QWORD PTR [rbp-0x18]
   0x000000000040071d <+103>: xor    rdx,QWORD PTR fs:0x28
   0x0000000000400726 <+112>: je     0x40072d <main(int, char**)+119>
   0x0000000000400728 <+114>: call   0x400570 <__stack_chk_fail@plt>
   0x000000000040072d <+119>: add    rsp,0x38
   0x0000000000400731 <+123>: pop    rbx
   0x0000000000400732 <+124>: pop    rbp
   0x0000000000400733 <+125>: ret   

In both <+80> and <+92>, destructor Word::~Word is called. However, they are exact copies, so we are trying to free char* word twice. Note that Word hello is constructed in <+38>, whose stack address is $rbp-0x30, and its entire content is copied over to $rbp-0x20 in <+60> and <+64>, which is Word hi.

So, how should we fix this? Well, we simply need to explicitly declare copy constructor for class Word, in which we create a copy of the string pointed by word and have the copy object's word point to it. In fact, we also need to explicitly declare copy operator in a similar manner, because implicit copy operator will, again, make the exact copy, having two different objects point to the same string.

Below is the code that correct these.



#include <cstdio>
#include <cstring>
#include <vector>
#include <cstdlib>

class Word {
  char* word;
public:
  Word() {
    word = NULL;
  }
  Word(const Word &that) {
    if (that.word == NULL) {
      this->word = NULL;
    } else {
      this->word = (char*)malloc(strlen(that.word));
      strcpy(this->word, that.word);
    }
  }
  ~Word() {
    if (word != NULL)
      free(word);
  }
  Word& operator=(const Word& that) {
    if (that.word == NULL) {
      this->word = NULL;
    } else {
      this->word = (char*)realloc(this->word, strlen(that.word));
      strcpy(this->word, that.word);
    }
    return *this;
  }

  void setWord(const char *word) {
    if (this->word != NULL) {
      this->word = (char*)realloc(this->word, strlen(word));
    }
    else {
      this->word = (char*)malloc(strlen(word));
    }
    strcpy(this->word, word);
  }
  const char* getWord() {
    return word;
  }
};

int main (int argc, char **argv) {
  Word hello; // default constructor
  hello.setWord("hello");
  Word hi(hello); // implicit copy constructor
  printf("hello: %s\n", hello.getWord());
  printf("hi: %s\n", hi.getWord());
  hi.setWord("hi");
  printf("hello: %s\n", hello.getWord());
  printf("hi: %s\n", hi.getWord());
  Word bye;
  bye = hello;
  printf("bye: %s\n", bye.getWord());
  bye.setWord("bye");
  printf("hello: %s\n", hello.getWord());
  printf("bye: %s\n", bye.getWord());

  return 0;
}

/* output
hello: hello
hi: hello
hello: hello
hi: hi
bye: hello
hello: hello
bye: bye
*/

With the proper modification, we now do not see any error, and the program runs as expected. In C++, there is so-called the rule of three:

If one has to explicitly declare any of class destructor, copy constructor, or copy assignment operator, then perhaps one needs to explicitly declare all of them.

Initially, we only declared explicit destructor, which caused a serious problem. Now that we have declared all three of them, we are in good shape.

Some other minor fixes to the above code is declaring the argument as const char* in setWord, and returning const char* for getWord method.

Saturday, August 13, 2016

Transition from C to C++: Class Constructors (C++98)

In this post, I will outline what a constructor does in C++ class.

Basically, a constructor is called when an object is instantiated or created. A constructor differs from a method in that it does NOT return anything, not even void. There are some notable constructors that are very important:
1. Default constructor: takes no argument.
2. Copy constructor: takes an object of the same class as an argument

These constructors are automatically generated by the compiler if not provided explicitly by the programmer. Let's consider an example below:



#include <iostream>

class Foo {
public:
  int data;
};

using namespace std;

int main() {
  Foo obj1; // default constructor
  Foo obj2 = {1}; // aggregate initialization
  Foo obj3(obj2); // default copy constructor
  Foo obj4 = obj3; // default copy constructor
  Foo obj5; // default constructor
  obj5 = obj2; // copy operator
  cout << obj1.data << endl;
  cout << obj2.data << endl;
  cout << obj3.data << endl;
  cout << obj4.data << endl;
  cout << obj5.data << endl;
  return 0;
}

/* output
129319203
1
1
1
1
*/
Because no constructor has been explicitly declared, both the default constructor and copy constructor have been auto-generated. For the implicit default auto-generated constructor, the fields are not initialized, so we will get some random value for obj1.data, as line 11 calls this implicit default constructor.

Line 12 is not calling the constructor; rather, this is simply aggregate initialization that we have seen in C.

Line 13 calls the copy constructor, but since it is not explicitly declared, the compiler will invoke implicit auto-generated copy constructor that simply copies the entire fields.

Line 14 also calls the copy constructor. It may be confusing, but line 16 does not call copy constructor; instead, it calls copy operator. The difference is whether the object is being declared the first time or not. In line 14, it is first declared, so the constructor needs to be invoked. Since the argument is another object of the same class, copy constructor is invoked. On the other hand, in line 16, the constructor needs not be called, since the default constructor has already been called in line 15. I will go over the copy operator in the next post. For now, it suffices to say that the copy operator is a special type of class method.

Now, let's consider the same code with explicitly-declared default constructor:


#include <iostream>

class Foo {
public:
  int data;
  /* default constructor: init data to zero */
  Foo() : data(0) {}
};

using namespace std;

int main() {
  Foo obj1; // default constructor
//  Foo obj2 = {1}; // aggregate initialization cannot be used when any constructor is explicitly declared
  Foo obj2;
  obj2.data = 1;
  Foo obj3(obj2); // default copy constructor
  Foo obj4 = obj3; // default copy constructor
  Foo obj5; // default constructor
  obj5 = obj2; // copy operator
  cout << obj1.data << endl;
  cout << obj2.data << endl;
  cout << obj3.data << endl;
  cout << obj4.data << endl;
  cout << obj5.data << endl;
  return 0;
}

/* output
0
1
1
1
1
*/

The default constructor is declared in line 7. It simply initializes data field to be zero. Note that it does not have a return type, since it is not returning anything. Also, a constructor is usually declared with public access, since it will be invoked from an external entity. data(0) will set data as 0 by calling the appropriate class's constructor. In this case, it is a simple int primitive data, so it should be equivalent to data = 0. If data type is a class, it will invoke corresponding constructor, as we can see in the next example below:



#include <iostream>

class Bar {
public:
  int i;
  Bar(int i) : i(i) {
  }
};

class Foo {
public:
  int data;
  Bar bar;
  /* default constructor: init data to zero */
  Foo(): bar(3), data(0) {
  }
};

using namespace std;

int main() {
  Foo obj1; // default constructor
  cout << obj1.data << endl;
  cout << obj1.bar.i << endl;
  return 0;
}

/* output
0
3
*/

Here, line 23 calls Foo's default constructor in line 16, which then calls Bar's constructor with an argument in line 7. Note that because Bar's constructor is explicitly declared, the compiler will not auto-generate implicit default constructor.

Finally, the copy constructor shall be defined in the following way:


#include <iostream>

class Bar {
public:
  int i;
  /* constructor with an agrument to init i to */
  Bar(int new_i) : i(new_i) {
  }
};

class Foo {
public:
  int data;
  Bar bar;
  /* default constructor: init data to zero */
  Foo(): bar(3), data(0) {
  }
  /* copy constructor: copy from given object */
  Foo(Foo &other) : data(other.data), bar(other.bar) {}
};

using namespace std;

int main() {
  Foo obj1; // default constructor
  cout << obj1.data << ", " << obj1.bar.i << endl;
  obj1.data = 1;
  Foo obj2(obj1);
  Foo obj3 = obj1;
  cout << obj2.data << ", " << obj2.bar.i << endl;
  cout << obj3.data << ", " << obj3.bar.i << endl;
  return 0;
}

/* output
0, 3
1, 3
1, 3
*/


In line 19, Foo's copy constructor has been explicitly declared. This constructor is invoked from line 28 and 29, and it will in turn invoke Bar's copy constructor. However, since Bar's copy constructor has not been explicitly declared, the compiler generates an implicit copy constructor.

So, one may ask this question: if the implicit copy constructor works well, why should we bother to declare it explicitly? The answer is that sometimes the class's field is a pointer to some external data, and if we rely on the default copy constructor, it will simply copy the pointer, so the two objects will point to the same data. This will defeat the purpose of making a copy, as modifying the data will affect both objects. Hence, one needs to explicitly declare the copy constructor to allocate a copy of the external data and have the field point to this copy. We will cover this scenario in more details in a follow up post.

Thursday, August 11, 2016

Transition from C to C++: Dynamic Memory Allocation

I have to admit that I am most comfortable with C among programming languages, but writing code in C sometimes gives me a headache. At the cost of its simple and elegant structure, C suffers from narrow features with no support for object-oriented programming paradigm. One of the most difficulties in programming with C is perhaps manual allocation of memory. For any dynamically varying-size object or data, one must manually allocate and free memory. If the programmer knows what he is doing, C allows him to create the most efficient program to perform the desired task. On the other hand, if the programmer does not know what he is doing, the resultant program will simply be very unstable, buggy, and inefficient.

When programming in C, there are simply too many thing that one needs to track down in terms of allocating and freeing resources in appropriate timing and space. Consider the very simple example below, which takes in user's input words and saves them.


#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct Word {
  char *word;
};

void initWord(struct Word *ptr) {
  ptr->word = NULL;
}

void setWord(struct Word *ptr, char* word) {
  if( ptr->word != NULL) {
    ptr->word = (char*)realloc(ptr->word, strlen(word)*sizeof(*ptr->word));
  }
  else {
    ptr->word = (char*)malloc( strlen(word)*sizeof(*ptr->word));
  }
  strcpy( ptr->word, word);
}

void freeWord(struct Word *ptr) {
  if (ptr->word != NULL)
    free(ptr->word);
}

int main( int argc, char **argv ) {
  struct Word *wordArray = (struct Word*)malloc((argc-1)* sizeof(*wordArray));
  for (int i=1; i<argc; i++) {
    initWord(&wordArray[i-1]);
    setWord(&wordArray[i-1], argv[i]);
    printf("word %d: %s\n", i, wordArray[i-1].word);
  }

  for (int i=1; i<argc-1; i++) {
    freeWord(&wordArray[i-1]);
  }

  free(wordArray);

  return 0;
}
Because one does not know how many words will be there to save, one has to create an array of struct Word dynamically by calling malloc(), assign appropriate memory space for each word pointer in the struct, and finally call free() on all dynamically allocated resources when exiting. It is the programmer's role to take care of resources, and when the program grows big and complicated, one becomes more and more prone to mistakes and bugs.

Fortunately enough, there is C++ to help. Although many people, including Linus Torvalds, complain about C++, I happen to believe that C++ is a good choice for some applications, not all. There is a catch though. Somebody must do the hard work of taking care of all the memory allocation of classes.

Consider the C++ version of the same code.



#include <cstdio>
#include <cstring>
#include <vector>
#include <cstdlib>

class Word {
  char* word;
public:
  Word() {
    word = NULL;
  }
  ~Word() {
    if (word != NULL)
      free(word);
  }
  void setWord(char *word) {
    if (this->word != NULL) {
      this->word = (char*)realloc(this->word, strlen(word));
    }
    else {
      this->word = (char*)malloc(strlen(word));
    }
    strcpy(this->word, word);
  }
  char* getWord() {
    return word;
  }
};

int main (int argc, char **argv) {
  std::vector<Word> words(argc-1);
  for (int i=1; i<argc; i++) {
    words[i-1].setWord(argv[i]);
    printf("word %d: %s\n", i, words[i-1].getWord());
  }

  return 0;
}
Here, I am intentionally using char* and not string object to demonstrate how C++ class constructors and destructors can be designed to relieve the burden of memory allocation from the user. Note that in main() function, the programmer never has to call malloc() or free() to allocate resources manually. All these are performed by the constructors and destructors, so the programmer can simply create and make use of class Word as if it is a primitive data. To focus on resource management, I an using as much as C features as possible, except for object-oriented structure and C++'s vector class. The vector class allows one to assign memory dynamically without calling malloc() or free(), as vector's constructor, destructor, and methods will take care of all those.

It is now clear that the burden of memory management now lies in the hand of the class and library writer. If the class or library is not well written, the entire program will become unstable. From the perspective of a programmer who simply makes use of the class and library, it is apparent that writing programs is much simpler, given that the library is well-designed.

In a follow up post, I will go through the class design rule in C++, namely the rule of three. In fact, the above C++ example is an excellent case that needs to be modified to comply the design rule.

Saturday, August 6, 2016

How to Create a C/C++ Project in Eclipse CDT with Custom Makefile and Debug

Integrated Development Environment (IDE) enables to speed up development process for most of us. In this tutorial, I will go over how to create a C/C++ project in Eclipse and use custom Makefile for compilation.

First, we need to download Eclipse with CDT plugin from here and install. Launch the application when complete.

I will go over C++ project, but it should be very similar to C project. From the menu, select File --> New --> C++ Project. Enter the Project name, say OpenCVExampleProject, and under Project type, select Executable --> Empty Project. Under Toolchains, select the appropriate toolchain, such as Linux GCC. If the toolchain has been automatically detected by Eclipse, it will enable Finish button, but if not you will need to proceed and configure the toolchain by entering the prefix and path.

Next, create src directory by right-clicking OpenCVExampleProject in the navigation pane on the left, New --> Folder and enter Folder name as src. To add a source file, right-click just-created src folder, New --> Source File and enter, say OpenCVExample.cpp, and select Finish.

Now, let's add the following lines of codes into the OpenCVExample.cpp file:

#include <opencv2/opencv.hpp>
#include <cstdio>

using namespace cv;

int main( int argc, char** argv )
{
  Mat image;
  image = imread( argv[1], 1 );

  if( argc != 2 || !image.data )
    {
      printf( "No image data \n" );
      return -1;
    }

  namedWindow( "Display Image", CV_WINDOW_NORMAL | CV_WINDOW_KEEPRATIO );
  imshow( "Display Image", image );

  waitKey(0);

  return 0;
}

Make sure to save the source file so that Eclipse can check errors in the code. If you haven't already installed opencv2 library in your system, do so now. For Ubuntu, simply run
$ sudo apt-get install -y libopencv-dev

Eclipse should not warn you of any errors you have followed well up to this point in the source file. Well, let's compile it and see what happens. 

Build the project by selecting Project --> Build All in the menu. You will see errors, similar to
Building target: OpenCVExampleProject
Invoking: GCC C++ Linker
g++  -o "OpenCVExampleProject"  ./src/OpenCVExample.o   
./src/OpenCVExample.o: In function `main':
/home/linuxnme/workspace/OpenCVExampleProject/Debug/../src/OpenCVExample.cpp:9: undefined reference to `cv::imread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'
makefile:45: recipe for target 'OpenCVExampleProject' failed
/home/linuxnme/workspace/OpenCVExampleProject/Debug/../src/OpenCVExample.cpp:17: undefined reference to `cv::namedWindow(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'
/home/linuxnme/workspace/OpenCVExampleProject/Debug/../src/OpenCVExample.cpp:18: undefined reference to `cv::_InputArray::_InputArray(cv::Mat const&)'
/home/linuxnme/workspace/OpenCVExampleProject/Debug/../src/OpenCVExample.cpp:18: undefined reference to `cv::imshow(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cv::_InputArray const&)'
/home/linuxnme/workspace/OpenCVExampleProject/Debug/../src/OpenCVExample.cpp:20: undefined reference to `cv::waitKey(int)'
./src/OpenCVExample.o: In function `cv::Mat::~Mat()':
/usr/include/opencv2/core/mat.hpp:278: undefined reference to `cv::fastFree(void*)'
./src/OpenCVExample.o: In function `cv::Mat::operator=(cv::Mat const&)':
/usr/include/opencv2/core/mat.hpp:298: undefined reference to `cv::Mat::copySize(cv::Mat const&)'
./src/OpenCVExample.o: In function `cv::Mat::release()':
/usr/include/opencv2/core/mat.hpp:367: undefined reference to `cv::Mat::deallocate()'
collect2: error: ld returned 1 exit status
make: *** [OpenCVExampleProject] Error 1

23:14:46 Build Finished (took 843ms)


The errors simply tell you that opencv library has not been linked. Now, it is time to create our own Makefile and compile. Select Project --> Properties --> C/C++ Build and uncheck Generate Makefiles automatically under Makefile generation. Next, delete objects.mk, and sources.mk files under Debug folder in the project root directory. These files were generated automatically by Eclipse.

Edit Debug/makefile similar to below (ignore makefile edit warning):
all:
g++ -g -o OpenCVExample ../src/OpenCVExample.cpp `pkg-config --libs opencv`

Make sure to save this file, and let's build it again. It should have successfully built the executable with following log:
23:22:53 **** Incremental Build of configuration Debug for project OpenCVExampleProject ****
make all 
g++ -g -o OpenCVExample ../src/OpenCVExample.cpp `pkg-config --libs opencv`

23:22:54 Build Finished (took 1s.35ms)

To test whether it build successfully, copy your favorite image file, say image.jpg into the project root directory. Select Run --> Profile Configuration --> New launch configuration --> Arguments and enter image.jpg

Finally, select Run --> Debug. Eclipse may show you a prompt asking for Confirm Perspective Switch for debugging purposes. Let's choose Yes. Eclipse will automatically break at the beginning of the main function. Press [F8] key to resume. You should be able to see your image window, which disappears after any key press. To go back to the previous perspective, simply click on C/C++ icon on the top-right corner.

Wednesday, August 3, 2016

How to Mount Bootcamp NTFS Drive on Mac with Write Access

If you have installed Windows on your Mac using Bootcamp, the odds are your bootcamp partition is NTFS, which is the default file system for Windows, including Windows 10. By default, your Mac will be able to read this partition, but won't be able to write.
$ cd /Volumes/BOOTCAMP
$ touch test
touch: test: Read-only file system

By default, the Bootcamp drive is mounted at /Volumes/BOOTCAMP, but it may differ on your system. To check your disk partitions, run
$ diskutil list
/dev/disk0 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *121.3 GB   disk0
   1:                        EFI EFI                     209.7 MB   disk0s1
   2:          Apple_CoreStorage Mac OS X                70.5 GB    disk0s2
   3:                 Apple_Boot Recovery HD             650.0 MB   disk0s3
   4:       Microsoft Basic Data BOOTCAMP                50.0 GB    disk0s4

As you can see above, on my system BOOTCAMP is the name of the partition, with 50GB of disk space assigned to it.

Now, let's enable write-access to NTFS partition. To do so, edit /etc/fstab file,
$ sudo vim /etc/fstab

and add the following line
LABEL=BOOTCAMP none ntfs rw,auto,nobrowse

Make sure that you replace BOOTCAMP with whatever the name of your NTFS partition.

The change will take place after re-mounting the partition. To do so, open up Disk Utility from /Applications/Utilities folder, select BOOTCAMP partition, and click on Unmount button to unmount it. Once it is unmounted, simply click on the Mount button to re-mount it.

You should now be able to read and write. For example, the following commands will create a test file on the NTFS partition and remove it.
$ cd /Volumes/BOOTCAMP
$ touch test
$ rm test

The write-access will remain as long as you keep the inserted line in /etc/fstab file.