survey on Visual object tracking based
on biologically inspired tracker
tracking is the way toward finding, recognizing, and deciding the dynamic
design of one or numerous moving (potentially deformable) protests (or parts of
items) in each casing of one or a few cameras.To build an general tracking
system,the recent progress in image representation,
appearance model and motion model are briefly reviewed in this paper. For
tracking either single target or
multiple targets the models which is reviewed here is basic enough to be
applied.The appearance model which is trending in recent time is given a
special attention. The key techniques and the factors which is tedious for the
tracker to find the object appearance changes are camera motion, illumination
variation,shape deformation and partial occlusion is discussed. For tracking
-by-detection and on-line boosting methods the state-of-the-art performs well(e.g TLD,Online Boost,MIL-Track,). Hence based on
this we check them together for a single person tracking.
tracking plays a vital role in the field of computation. In a video there is
different frames where the object changes around the spot. The tracker finds
the part in a frame of an object that is similar to the original this
is known as object tracking. The role of object tracking in following tasks:
Vehicle navigation – the obstacle avoidance and
video-based route planning.
Video indexing – the renewal of videos in database.
Surveillance – to monitor the suspicious activites.
Traffic monitoring – to gather the information of
has a wide assortment of uses including movement investigation, video
observation, human PC connection and robot discernment. The
object tracking can be tedious due to
real time processing requirements,disturbance in images, Information lossage.
It has been seriously explored in the
previous decade. To enhance visual object
tracking, one may need to address these difficulties by growing better
element portrayals of visual targets and more compelling following models.In
the following , the image representation and appearance models are discussed
image representation,the features can be represented in texture,points,contours
and shape.For tracking the object detection can be adapted from any
representation.For example,ships at sea, cars on road and fishes in tank etc.
this section,first the typical image features and object shape representations
commonly employed for tracking and then address the joint shape representations
1.1Typical Image Features
In 3,Xiang describes about Color features (e.g. color histogram) have low computational
cost and are invariant to point-to-point transformations. In 5, Efros selects
the best color features from five color spaces to model skin color for facing
tracking. However, they are not robust against illumination changes. Also, they
are not discriminative enough due to lack of spatial information.
–Texture features. (e.g. LBP) have
high discriminative ability, though being computationally expensive. Nguyen and
Smeulders classify texture features using LDA 6.
general, the point representation is suitable for tracking objects that occupy
small regions in an image1. The object is represented by a point, that is,
the centroid (Figure 1(a))
or by a set of points (Figure 1(b)) 8.
geometric shapes.Object motion for such representations is
usually modeled by translation, affine, or projective (homography)
transformation1 .Object shape is represented by a rectangle, ellipse (Figure
1(c), (d) 9, etc Though primitive geometric shapes are more suitable for representing
simple rigid objects, they are also used for tracking nonrigid objects1.
Fig. 1. Object representations. (a)
Centroid, (b) multiple points, (c) rectangular patch, (d) elliptical patch, (e)
part-based multiple patches, (f) object skeleton, (g)complete object contour,
(h) control points on object contour, (i) object silhouette.
silhouette and contour. Contour representation defines the
boundary of an
(Figure 1(g), (h). The region inside the contour is called the silhouette of
shape models. Articulated objects are composed of body
parts that are held together with joints. For example, the human body is an
articulated object with torso, legs, hands, head, and feet connected by joints.
The relationship between the parts are governed by kinematic motion models, for
example, joint angle, etc. In order to represent an articulated object, one can
model the constituent parts using cylinders or ellipses as shown in Figure 1(e)1
real-world surveillance scenes, target appearance tends to change during
tracking (i.e. variation in target appearance) and background may
include moving objects (i.e. variation in the scene). The less associated
the target’s appearance model is with those variations, the more specific it is
in representing that particular object. Then, the tracker is less likely to get
confused with other objects or background clutter3.
generative appearance models (KGAMs)
generative appearance models (KGAMs) utilize kernel density estimation to
kernel-based visual representations, and then carry out the mean shift for
is divided into branches: color-driven KGAMs, shapeintegration
non-symmetric KGAMs 2.
The color-driven KGAM 9 builds a color
visual representation regularized by a spatially smooth isotropic kernel.
the tracker Comaniciu et al. 2003 only considers color information and
other useful information such as edge and shape, resulting in the sensitivity
clutters and occlusions.
KGAMs-The main aim of shape-integration is to build a a kernel
function in the joint color-shape space. It is based on two spatially
normalized and rotationally symmetric kernels for describing the information
about the color and object boundary2
KGAMs- The conventional KGAMs use a symmetric kernel
(e.g., a circle or
ellipse), leading to a large estimation bias in the process of estimating the
density function. the non-symmetric KGAM needs to simultaneously estimate
image coordinates, the scales, and the orientations in a few number of mean
Boosting-based discriminative appearance models
The visual object tracking is widely used in
boosting-based discriminative appearance models (BDAMs) because of their
powerful discriminative learning capabilities.It is classified into
selflearning and co-learning BDAMs.To guide the task of object/non-object
classification from single source self-learning BDAMs is used and the
co-learning BDAMs performs the multi-source discriminative information for
BDAMs also take different strategies for visual representation, i.e., single-instance
and multi-instance ones. The single-instance BDAMs require precise object localization.
If a precise object localization is not available, these tracking algorithms
may use sub-optimal positive samples to update their corresponding object or
non-object discriminative classifiers, which may lead to a model drift problem.
Moreover, object detection or tracking has its own inherent ambiguity, that is,
precise object locations may be unknown even for human labelers. To deal with
this ambiguity, the multi-instance BDAMs are proposed to represent an object by
a set of image patches around the tracker location. Thus, they can be further
classified into single-instance or multi-instance BDAMs.
learning-based discriminative appearance models (RLDAMs)
In principle, randomized learning techniques can
build a diverse classifier ensemble by performing random input selection and
random feature selection. In contrast to boosting and SVM, they are more
computationally efficient, and easier to be extended for handling multi-class
learning problems. However, their tracking performance is unstable for
scenes because of their random
analysis-based discriminative appearance models (DADAMs)
Discriminant analysis is a powerful tool for
supervised subspace learning. In principle, its goal is to find a
low-dimensional subspace with a high inter-class separability. According to the
learning schemes used, it can be split into two branches: conventional
discriminant analysis and graph-driven discriminant analysis.In general,
conventional DADAMs are formulated in a vector space while graph-driven DADAMs
utilize graphs for supervised subspace learning2.
The motion model is essentially a problem of feature
matching which is discussed briefly. The methods are optical flow model and Bayesian filtering.
The latter two are more widely used.
3.1. Optical Flow
The optical flow method is based on the assumption
of constant lightness across frames. That is true, if the illumination
condition does not have drastic change or the frame rate is high.
the Bayesian filtering framework 10 (e.g. Kalman filter, particle
filter), we want to recursively estimate the current target state vector each
time a new observation is
We use t z and t x to respectively represent the target’s motion
state and appearance (e.g. positive/negative sample) at time t.
In 3 There are several methods to validate on a challenging sequence. The prototype is a
vision-guided mobile robot following a person. Primary challenges come from: 1)
Human back tracking: since the target person is commonly back to the camera,
face or frontal information is not available.
Background distraction: the target person walks down a corridor, wearing a
white uniform similar with the white wall.
Scale variation: the scales changes between 120% and 50% with respect to the
initial size. 4) Pose and shape variation:
the person turns around at the corner, there exists pose variation.
Lightness variation: the indoor scene is generally not very bright, but with
lights flashing and strong sunlight through the window.
Occlusion from objects with similar appearance.
this work the survey of appearance models, image representation and motion
models is presented.In2 the visual representations focus more on how to
robustly describe the spatio-temporal characteristics of object appearance,
while the statistical modeling schemes for tracking-by-detection put more
emphasis on how to capturethe generative appearance model information of the
object regions. These modules are closely related and interleaved with each
other. In practice, powerful appearance models depend on not only effective
visual representations but also robust statistical models. Optical flow based
motion estimation suffers mismatches of feature points.In scenes with drastic
illumination change, the hypothesis of constant lightness may be not
1 Yilmaz, A., Javed, O., and Shah,
M. 2006. Object tracking: A survey. ACM Comput. Surv. 38, 4, Article 13
(Dec. 2006), 45 pp
2. Xi Li2;1, Weiming Hu2, Chunhua
Shen1, Zhongfei Zhang3, Anthony Dick1, Anton van den
Hengel1. A Survey of Appearance Models in Visual
Object Tracking. Volume 4 Issue
4, September 2013 Article No. 58
.K. Cannons, “A review of visual tracking,” Dept. Comput. Sci. Eng., York
Univ., Toronto, ON, Canada, Tech. Rep. CSE-2008-07, 2008.
A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M.
Shah, “Visual tracking: An experimental survey,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 36, no. 7, pp. 1442–1468, Jul. 2014.
B. Efros. Adaptive Color Space Switching for Face Tracking in Multi Colored
Lighting Environments. In FG, 2002.
H. Nguyen and A. Smeulders. Robust Tracking using Foreground Background Texture
Discrimination. IJCV, 69(3): 277-293, 2006.
VEENMAN, C., REINDERS, M., AND BACKER, E. 2001. Resolving motion correspondence
for densely moving points. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1,
SERBY, D., KOLLER-MEIER, S., AND GOOL, L. V. 2004. Probabilistic object
tracking using multiple features. In IEEE International Conference of Pattern
Recognition (ICPR). 184–187.
COMANICIU, D., RAMESH, V., ANDMEER, P. 2003. Kernel-based object tracking. IEEE
Trans. Patt. Analy. Mach.Intell. 25, 564–575.
10. A. Doucet, S. Godsill
and C. Andrieu. On sequential Monte Carlo sampling methods for Bayesian
filtering. Statistics and Computing, 10(3): 197-208,2000.