In many tasks of reverse engineering and binary code analysis (e.g., hybrid disassembly, resolving indirect jump, and decoupled taint analysis), the knowledge of detailed dynamic control flow can be of great value. However, the high runtime overhead beset the complete collection of dynamic control flow. The previous efforts on efficient path profiling cannot be directly applied to the obfuscated binary code in which an accurate control flow graph is typically absent. To address these challenges, we present BinCFP, an efficient multi-threaded binary code control flow profiling tool by taking advantage of pervasive multi-core platforms. BinCFP relies on dynamic binary instrumentation to work with the unmodified binary code. The key of BinCFP is a multi-threaded fast buffering scheme that supports processing trace buffers asynchronously. To achieve better performance gains, we also apply a set of optimizations to reduce control flow profile size and instrumentation overhead. Our design enables the complete dynamic control flow collection for an obfuscated binary execution. We have implemented BinCFP on top of Pin. The comparative experiments on SPEC2006 and obfuscated common utility programs show BinCFP outperforms the previous work in several ways. In addition, BinCFP's control flow profile sizes are only about 49.2% that of the conventional design.