Abstract:
Human Action Recognition is one of the intriguing research area of modern Artificial Intelligence and Computer Vision. Different researchers have proposed different methods to enable machines with the capability of recognizing human actions. One of the most traversed approaches is to use 3D depth image to acknowledge human actions. Another approach is to consider human silhouettes to predict the human actions. In this thesis we introduce a novel method to extract key frames for recognizing human actions where we use the human actions using the help of 3D skeletal joint locations. The key frames are selected depending on the distance from one frame to its neighbours and selecting a fixed number of frames out of any arbitrary number of frames. We use Microsoft Kinect to extract the joint locations where any human’s twenty joint locations are provided in 3D Cartesian coordinate system. Thought there are some errors in Microsoft Kinect’s joint location extraction, we consider the locations to be accurate and our research starts from that assumption. Here we introduce a new feature representation by combining histogram of joint 3D (HOJ3D) and static posture feature of 3D skeletal joint locations. By combining two representation we try to overcome their corresponding disadvantages. HOJ3D fails to represent how individual joints changes their corresponding locations with respect to other joints. Static posture feature fails to represent how these joints are distributed. Then we used Hidden Markov Model (HMM) to recognize human actions. We perform an extensive set of experiments and compare our method with some of the existing method in the field by using publicly available datasets. The evaluation method followed is n-fold validation and the results show that our method is more accurate and robust consuming less time while generating key frames. Performances generated by different number of key frames and hidden states for Hidden Markov Models are compared to show the output measure of our proposed system. The method can be used in real time to recognize human actions and can be deployed for security, augmented reality and other computer vision oriented purposes.
Description:
Supervised by
Md. Hasanul Kabir, Ph.D.
Associate Professor,
Department of Computer Science and Engineering (CSE),
Islamic University of Technology (IUT),
Board Bazar, Gazipur-1704, Bangladesh.