Android Custom Layout: FlowLayout

13 Dec 2013

Category: programming. Tags: android, java, ui.

Hello, Android devs! This is my first blog about Android. We know Android SDK provides you a bunch of useful layouts: FrameLayout, LinearLayout, RelativeLayout, etc. So where is FlowLayout, which works like a multiline TextView but holding views instead of texts? Developers can just add views to a FlowLayout and each view is put to the right of the previous one and wraps to a new row when the current row is full. I’m gonna show you how you could implement your own FlowLayout with less than 100 lines of code.


Image Duplication Detection

19 Apr 2013

Category: programming. Tags: image, python.

In my last post, I downloaded thounsands of images from Jiandan OOXX. (What a cool site!) When I was enjoying this amazing collection, I found there are many duplicate images. I don’t want my disk space wasted on duplicate images, so I need to figure out a way to deal with them.

Detecting duplication images is totally different from detecting duplicate normal files, because two same image may be in different formats, of different dimensions, have different sizes. Hash values can’t be relied to detect image duplications, other image features should be taken into consideration.


Event Based HTML Parsing with Python

18 Apr 2013

Category: programming. Tags: html, parsing, python.

HTML parsing is a useful technique for crawlers. Three major techniques to parse HTML are regular expression, tree-model based parsing and event based parsing. Regular expression is universally applicable for all parsings, but it’s tricky and hard for other people to understand or even maintain.

Tree-model based parsing is powerful and popular, a lot of HTML/XML parsing libraries construct an in-memory tree-like model to represent the structure of the parsed HTML. The first drawback of this kind of parsing is obvious, constructing tree-models in memory requires memory for the entire HTML even if we only need a small part of it. The second problem of tree-model based parsing is the cost of CPU time. If a big HTML file is parsed and the in-memory model is huge, a lot of CPU time is cost to build the model and travel in the model.


Mac Start Guide for Linuxers

29 Oct 2012

Category: workspace. Tags: mac, linux.

I just got a late 2009 Macbook Air and I have been using it for a month. As an advanced Linux user, I found Mac OS X is very easy to use with the powerful unix command line tools. As a GUI user, I think Mac provide a consistent and decent user interface. I recommand all Linux users who is exhausted with the Linux desktops to give it a try. Here I want to give the Linuxers a start guide to Mac OS X.


Linux IPC with Pipes

27 Jul 2012

Category: programming. Tags: ipc, pipe.

Inter-Process Communication (IPC) is a set of methods for exchanging data among multiple processes. IPC is a very common mechanism in Linux and Pipe maybe one of the most widely used IPC methods. When you type cat foo | grep bar, you create a pipe to connect stdout of cat to stdin of grep. A pipe, as its name states, can be understood as a channel with two ends. Pipe is actually implemented using a piece of kernel memory. The system call pipe always create a pipe and two associated file descriptions, fd[0] for reading from the pipe and fd[1] for writing to the pipe.