Time: Sep.22. Afternoon

Title:Human Motion Tracking with Microsoft Kinect

Instructor:Wenbing Zhao, Professor, Cleveland State University,USA   w.zhao1@csuohio.edu 

Instructor Biography:

Dr. Zhao is a Full Professor at the Department of Electrical Engineering and Computer Science, Cleveland State University (CSU). He earned his Ph.D. degree at University of California, Santa Barbara in 2002. Dr. Zhao has been doing research on smart and connected health since 2010 and on distributed systems since 1998. He has an active sponsored research grant on building a Kinect-based system to enhance safe patient handling in nursing homes, and has been teaching a course on Kinect application development at CSU. Dr. Zhao has over 150 peer-reviewed publications, and a US patent (pending) on privacy-aware selective human activity tracking using programmable depth cameras. He has served on several research panels for the US National Science Foundation, as the Program Chair for IEEE Smart World Congress (Toulouse, France) in 2016, and as a member of the technical program committee for many IEEE conferences.

Keywords: Microsoft Kinect, Depth Cameras, Gesture and Activity Recognition, Machine Learning

Intended Students and Prerequisites

Anyone with a computer science or computer engineering undergraduate degree could follow the tutorial

What Can Attendees Expect to Learn?

This tutorial will enable participants to understand the Kinect technology, learn various algorithms for human activity recognition, get familiar with tools for making interactive avatar-based 3D graphical interfaces, and learn how to generate real-time visual and haptic feedbacks with the integration of wearable devices such as smart watches.

Course Outline

This tutorial will provide a comprehensive review of the applications of Microsoft Kinect in various domains and recent studies on human motion tracking and recognition that power these applications. This tutorial will contain the following sections: (1) Introducing the Kinect technology; (2) A systematic review of Kinect applications; (3) Research work on human motion tracking with Kinect; (4) Research work on human motion recognition with Kinect; (5) How to program with Kinect SDK and Unity3D for powerful 3D applications; (6) How to integrate Microsoft Kinect with wearable computing.


This tutorial contains the following six major elements.

Element 1: The Kinect technology, including its features, technical specification, programming interfaces, and the depth sensing techniques.

Element 2: Various applications of the Kinect technology. The applications are categorized into the areas of healthcare (physical therapy, operating room assistance, and fall detection and prevention), virtual reality and gaming, natural user interface, robotics control and interaction, retail services, workplace safety training, speech and sign language recognition, and 3D recognition, and education and performing arts.

Element 3: Research work on human motion tracking. In this element, various computer vision techniques used to achieve human pose and skeleton estimation will be discussed. The foundation for human pose and skeleton estimation is the per-pixel body part classification, which is followed by estimating hypotheses of body joint positions by finding a local centroids of the body part probability mass using mean shift mode detection. In addition, I will also introduce the research on hand detection and hand pose estimation.

Element 4: Research work on human motion recognition. Unlike human motion tracking, which focuses on the recognition of human body joints and segments, human motion recognition aims to understand the semantics of the human gestures and activities. A gesture typically involves one or two hands, and possibly body poses, to convey some concrete meaning, such as waving the hand to say goodbye. An activity usually refers to a sequence of full body movements that a person performs, such as walking, running, brushing teeth, etc., which not necessary conveys a meaning to the computer or other persons. Rehabilitation exercises form a special type of activities. The approaches used in gesture and activity recognition are divided into the following categories:

· Algorithmic based recognition: In this approach, a gesture or an activity is recognized based on a set of manually defined rules. Algorithmic-based recognition is popular in gaming and healthcare applications because the gestures and/or activities are usually very well defined, relatively simple, and repetitive in nature. Each gesture or activity normally has a pre-defined starting and ending pose that can be used to delineate an iteration of the gesture or activity. Naturally, the algorithmic-based motion recognition approach is a good fit in such application domains.

· Direct-matching-based recognition: In this approach, the unknown gesture or activity is directly compared with a set of templates. DTW is the most well-known technique to analyze the similarity between two temporal sequences that may vary in time and speed by finding an optimal alignment between them. Typically one sequence is an unknown sequence to be classified and the other sequence is a pre-classified reference sequence. The difference between the two sequences is expressed in terms of the distance between the two.  In addition to DTW, other direct matching methods include the maximum correlation coefficient, and Earth Mover's Distance.

· Machining-learning-based motion recognition: This approach typically relies on one or more sophisticated statistical models, such the Hidden Markov Model (HMM), Artificial Neural Networks (ANNs), Support Vector Machine (SVM), Decision Forest, and Adaboost, to capture the unique characteristics of a gesture or an activity. Most of such models consist of a large number of parameters, which have to be determined in a training step based on pre-labeled motion data (including both data for the gesture to be recognized, and other motion data that are known not be the specific gesture). In general, the larger of the feature set used for classification, the larger training dataset is required. For some models, such has ANNs, additional modeling parameters have to be manually tuned to achieve good classification accuracy. Furthermore, regression-based methods have also been used for motion recognition.

Element 5: How to use the Kinect SDK and the Unity3D framework to develop practical and powerful Kinect applications for human motion tracking and recognition. 

Element 6: How to integrate Microsoft Kinect with wearable devices for realtime feedback and for multi-model tracking.

Quick Links

Best Paper Awards 

Several best paper Awards SELECTED from different sessions  will be given at the dinner banquet of ICVISP 2017. 

Media Partner




Japan is a stratovolcanic archipelago of 6,852 islands. The four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area. The country is divided into 47 prefectures in eight regions. The population of 126 million is the world's tenth largest. Japanese people make up 98.5% of Japan's total population.  


Ms. Anna H. M. Wong

Email: icvisp@iased.org

Tel:+852-30696823 (English)

Monday-Friday, 9:30am-12:00pm and 1:30pm-6:00pm