A
survey on  Visual object tracking based
on biologically inspired tracker

Abstract

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Visual
tracking is the way toward finding, recognizing, and deciding the dynamic
design of one or numerous moving (potentially deformable) protests (or parts of
items) in each casing of one or a few cameras.To build an general tracking
system,the recent progress in image representation,
appearance model and motion model are briefly reviewed in this paper. For
tracking either single target or 
multiple targets the models which is reviewed here is basic enough to be
applied.The appearance model which is trending in recent time is given a
special attention. The key techniques and the factors which is tedious for the
tracker to find the object appearance changes are camera motion, illumination
variation,shape deformation and partial occlusion is discussed. For tracking
-by-detection and on-line boosting methods  the state-of-the-art performs well(e.g  TLD,Online Boost,MIL-Track,). Hence based on
this we check them together for a single person tracking.

 

Keywords:
Tracking,target ,models,appearance

 

 

 

 

 

 

 

 

 

 

Introduction

Object
tracking plays a vital role in the field of computation. In a video there is
different frames where the object changes around the spot. The tracker finds
the  part in a  frame of  an object that is similar to the original this
is known as object tracking. The role of object tracking in following tasks:

·        
Vehicle navigation – the obstacle avoidance and
video-based route planning.

·        
Video indexing – the renewal of videos in database.

·        
Surveillance – to monitor the suspicious activites.

·        
Traffic monitoring – to gather the information of
traffic status.

It
has a wide assortment of uses including movement investigation, video
observation, human PC connection and robot discernment. The
object tracking  can be tedious due to
real time processing requirements,disturbance in images, Information lossage.
It  has been seriously explored in the
previous decade. To enhance visual object 
tracking, one may need to address these difficulties by growing better
element portrayals of visual targets and more compelling following models.In
the following , the image representation and appearance models are discussed
briefly.

1.Image representation

In
image representation,the features can be represented in texture,points,contours
and shape.For tracking the object detection can be adapted from any
representation.For example,ships at sea, cars on road and fishes in tank etc.

In
this section,first the typical image features and object shape representations
commonly employed for tracking and then address the joint shape representations
is described.1

 

1.1Typical Image Features

In 3,Xiang describes about Color features (e.g. color histogram) have low computational
cost and are invariant to point-to-point transformations. In 5, Efros selects
the best color features from five color spaces to model skin color for facing
tracking. However, they are not robust against illumination changes. Also, they
are not discriminative enough due to lack of spatial information.

 

–Texture features. (e.g. LBP) have
high discriminative ability, though being computationally expensive. Nguyen and
Smeulders classify texture features using LDA 6.

 

1.2Shape representation

 

–Points.In
general, the point representation is suitable for tracking objects that occupy
small regions in an image1. The object is represented by a point, that is,
the centroid (Figure 1(a))

7
or by a set of points (Figure 1(b)) 8.

 

–Primitive
geometric shapes.Object motion for such representations is
usually modeled by translation, affine, or projective (homography)
transformation1 .Object shape is represented by a rectangle, ellipse (Figure
1(c), (d) 9, etc Though primitive geometric shapes are more suitable for representing
simple rigid objects, they are also used for tracking nonrigid objects1.

 

Fig. 1. Object representations. (a)
Centroid, (b) multiple points, (c) rectangular patch, (d) elliptical patch, (e)
part-based multiple patches, (f) object skeleton, (g)complete object contour,
(h) control points on object contour, (i) object silhouette.

 

 

–Object
silhouette and contour. Contour representation defines the
boundary of an

object
(Figure 1(g), (h). The region inside the contour is called the silhouette of
the

object.1

 

—Articulated
shape models. Articulated objects are composed of body
parts that are held together with joints. For example, the human body is an
articulated object with torso, legs, hands, head, and feet connected by joints.
The relationship between the parts are governed by kinematic motion models, for
example, joint angle, etc. In order to represent an articulated object, one can
model the constituent parts using cylinders or ellipses as shown in Figure 1(e)1

 

2.Apearance
model

In
real-world surveillance scenes, target appearance tends to change during
tracking (i.e. variation in target appearance) and background may
include moving objects (i.e. variation in the scene). The less associated
the target’s appearance model is with those variations, the more specific it is
in representing that particular object. Then, the tracker is less likely to get
confused with other objects or background clutter3.

 

2.1Kernel-based
generative appearance models (KGAMs)

Kernel-based
generative appearance models (KGAMs) utilize kernel density estimation to

construct
kernel-based visual representations, and then carry out the mean shift for
object

localization.It
is divided into branches: color-driven KGAMs, shapeintegration

KGAMs,
non-symmetric KGAMs 2.

 

Color-driven
KGAMs-
The  color-driven KGAM 9 builds a color

histogram-based
visual representation regularized by a spatially smooth isotropic kernel.

However,
the tracker Comaniciu et al. 2003 only considers color information and
therefore

ignores
other useful information such as edge and shape, resulting in the sensitivity
to

background
clutters and occlusions.

Shape-integration
KGAMs-The main aim of shape-integration is to build a a kernel

density
function in the joint color-shape space. It is based on two spatially
normalized and rotationally symmetric kernels for describing the information
about the color and object boundary2

Non-symmetric
KGAMs- The conventional KGAMs use a symmetric kernel
(e.g., a circle or

an
ellipse), leading to a large estimation bias in the process of estimating the
complicated

underlying
density function. the non-symmetric KGAM needs to simultaneously estimate

the
image coordinates, the scales, and the orientations in a few number of mean
shift iterations.2

 

2.2
Boosting-based discriminative appearance models

The visual object tracking is widely used in
boosting-based discriminative appearance models (BDAMs) because of their
powerful discriminative learning capabilities.It is classified into
selflearning and co-learning BDAMs.To guide the task of object/non-object
classification from single source self-learning BDAMs is used and the
co-learning BDAMs performs the multi-source discriminative information for
object detection.

 

In 2
BDAMs also take different strategies for visual representation, i.e., single-instance
and multi-instance ones. The single-instance BDAMs require precise object localization.
If a precise object localization is not available, these tracking algorithms
may use sub-optimal positive samples to update their corresponding object or
non-object discriminative classifiers, which may lead to a model drift problem.
Moreover, object detection or tracking has its own inherent ambiguity, that is,
precise object locations may be unknown even for human labelers. To deal with
this ambiguity, the multi-instance BDAMs are proposed to represent an object by
a set of image patches around the tracker location. Thus, they can be further
classified into single-instance or multi-instance BDAMs.

 

2.3Randomized
learning-based discriminative appearance models (RLDAMs)

In principle, randomized learning techniques can
build a diverse classifier ensemble by performing random input selection and
random feature selection. In contrast to boosting and SVM, they are more
computationally efficient, and easier to be extended for handling multi-class
learning problems. However, their tracking performance is unstable for
different

scenes because of their random
feature selection.2

 

 

2.4Discriminant
analysis-based discriminative appearance models (DADAMs)

Discriminant analysis is a powerful tool for
supervised subspace learning. In principle, its goal is to find a
low-dimensional subspace with a high inter-class separability. According to the
learning schemes used, it can be split into two branches: conventional
discriminant analysis and graph-driven discriminant analysis.In general,
conventional DADAMs are formulated in a vector space while graph-driven DADAMs
utilize graphs for supervised subspace learning2.

 

3.Motion model

The motion model is essentially a problem of feature
matching which is discussed briefly. The  methods are optical flow model and Bayesian filtering.
The latter two are more widely used.

3.1. Optical Flow

The optical flow method is based on the assumption
of constant lightness across frames. That is true, if the illumination
condition does not have drastic change or the frame rate is high.

 

3.2.Bayesian Filtering
Framework

In
the Bayesian filtering framework 10 (e.g. Kalman filter, particle
filter), we want to recursively estimate the current target state vector each
time a new observation is

received.
We use t z and t x to respectively represent the target’s motion
state and appearance (e.g. positive/negative sample) at time t.

 

In 3 There are several methods to validate  on a challenging sequence. The prototype is a
vision-guided mobile robot following a person. Primary challenges come from: 1)
Human back tracking: since the target person is commonly back to the camera,
face or frontal information is not available.

2)
Background distraction: the target person walks down a corridor, wearing a
white uniform similar with the white wall.

3)
Scale variation: the scales changes between 120% and 50% with respect to the
initial size. 4) Pose and shape variation:

when
the person turns around at the corner, there exists pose variation.

5)
Lightness variation: the indoor scene is generally not very bright, but with
lights flashing and strong sunlight through the window.

6)
Occlusion from objects with similar appearance.

 

 

Conclusion

In
this work the survey of appearance models, image representation and motion
models is presented.In2 the visual representations focus more on how to
robustly describe the spatio-temporal characteristics of object appearance,
while the statistical modeling schemes for tracking-by-detection put more
emphasis on how to capturethe generative appearance model information of the
object regions. These modules are closely related and interleaved with each
other. In practice, powerful appearance models depend on not only effective
visual representations but also robust statistical models. Optical flow based
motion estimation suffers mismatches of feature points.In scenes with drastic
illumination change, the hypothesis of constant lightness may be not
true.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

References

1 Yilmaz, A., Javed, O., and Shah,
M. 2006. Object tracking: A survey. ACM Comput. Surv. 38, 4, Article 13
(Dec. 2006), 45 pp

 

2. Xi Li2;1, Weiming Hu2, Chunhua
Shen1, Zhongfei Zhang3, Anthony Dick1, Anton van den

Hengel1. A Survey of Appearance Models in Visual
Object Tracking. Volume 4 Issue
4, September 2013 Article No. 58 

 

3.
.K. Cannons, “A review of visual tracking,” Dept. Comput. Sci. Eng., York
Univ., Toronto, ON, Canada, Tech. Rep. CSE-2008-07, 2008.

 

4.
A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M.
Shah, “Visual tracking: An experimental survey,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 36, no. 7, pp. 1442–1468, Jul. 2014.

 

5.
B. Efros. Adaptive Color Space Switching for Face Tracking in Multi Colored
Lighting Environments. In FG, 2002.

 

6.
H. Nguyen and A. Smeulders. Robust Tracking using Foreground Background Texture
Discrimination. IJCV, 69(3): 277-293, 2006.

 

7.
VEENMAN, C., REINDERS, M., AND BACKER, E. 2001. Resolving motion correspondence
for densely moving points. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1,
54–72.

 

8.
SERBY, D., KOLLER-MEIER, S., AND GOOL, L. V. 2004. Probabilistic object
tracking using multiple features. In IEEE International Conference of Pattern
Recognition (ICPR). 184–187.

 

9.
COMANICIU, D., RAMESH, V., ANDMEER, P. 2003. Kernel-based object tracking. IEEE
Trans. Patt. Analy. Mach.Intell. 25, 564–575.

 

10. A. Doucet, S. Godsill
and C. Andrieu. On sequential Monte Carlo sampling methods for Bayesian
filtering. Statistics and Computing, 10(3): 197-208,2000.