A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines

Abhikesh Nag, David Jonathan Miller, Andrew P. Brown, Kevin J. Sullivan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the extracted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

    Original languageEnglish (US)
    Title of host publicationIntelligent Computing
    Subtitle of host publicationTheory and Applications V
    Volume6560
    DOIs
    StatePublished - Nov 15 2007
    EventIntelligent Computing: Theory and Applications V - Orlando, FL, United States
    Duration: Apr 9 2007Apr 10 2007

    Other

    OtherIntelligent Computing: Theory and Applications V
    CountryUnited States
    CityOrlando, FL
    Period4/9/074/10/07

    Fingerprint

    Scale Invariant Feature Transform
    Mixture Model
    Support vector machines
    Support Vector Machine
    vehicles
    Vote
    classifiers
    Classifiers
    Affine transformation
    Classifier
    Affine Invariant
    Target
    Scale Invariant
    Feature Space
    Descriptors
    Resolve
    Likelihood
    Ensemble
    Maximise
    optimization

    All Science Journal Classification (ASJC) codes

    • Electronic, Optical and Magnetic Materials
    • Condensed Matter Physics
    • Computer Science Applications
    • Applied Mathematics
    • Electrical and Electronic Engineering

    Cite this

    Nag, A., Miller, D. J., Brown, A. P., & Sullivan, K. J. (2007). A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines. In Intelligent Computing: Theory and Applications V (Vol. 6560). [65600G] https://doi.org/10.1117/12.723746
    Nag, Abhikesh ; Miller, David Jonathan ; Brown, Andrew P. ; Sullivan, Kevin J. / A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines. Intelligent Computing: Theory and Applications V. Vol. 6560 2007.
    @inproceedings{58e7458b24e14b93b4a243d056280e59,
    title = "A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines",
    abstract = "We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the extracted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.",
    author = "Abhikesh Nag and Miller, {David Jonathan} and Brown, {Andrew P.} and Sullivan, {Kevin J.}",
    year = "2007",
    month = "11",
    day = "15",
    doi = "10.1117/12.723746",
    language = "English (US)",
    isbn = "0819466824",
    volume = "6560",
    booktitle = "Intelligent Computing",

    }

    Nag, A, Miller, DJ, Brown, AP & Sullivan, KJ 2007, A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines. in Intelligent Computing: Theory and Applications V. vol. 6560, 65600G, Intelligent Computing: Theory and Applications V, Orlando, FL, United States, 4/9/07. https://doi.org/10.1117/12.723746

    A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines. / Nag, Abhikesh; Miller, David Jonathan; Brown, Andrew P.; Sullivan, Kevin J.

    Intelligent Computing: Theory and Applications V. Vol. 6560 2007. 65600G.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines

    AU - Nag, Abhikesh

    AU - Miller, David Jonathan

    AU - Brown, Andrew P.

    AU - Sullivan, Kevin J.

    PY - 2007/11/15

    Y1 - 2007/11/15

    N2 - We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the extracted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

    AB - We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the extracted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

    UR - http://www.scopus.com/inward/record.url?scp=35948991551&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=35948991551&partnerID=8YFLogxK

    U2 - 10.1117/12.723746

    DO - 10.1117/12.723746

    M3 - Conference contribution

    SN - 0819466824

    SN - 9780819466822

    VL - 6560

    BT - Intelligent Computing

    ER -

    Nag A, Miller DJ, Brown AP, Sullivan KJ. A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines. In Intelligent Computing: Theory and Applications V. Vol. 6560. 2007. 65600G https://doi.org/10.1117/12.723746