Combined generative-discriminative learning for object recognition using local image descriptors

Ahhikesh Nag, David Jonathan Miller, Andrew P. Brown, Kevin J. Sullivan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    4 Citations (Scopus)

    Abstract

    We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the exttacted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

    Original languageEnglish (US)
    Title of host publicationMachine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP
    Pages360-365
    Number of pages6
    DOIs
    StatePublished - Dec 1 2007
    Event17th IEEE International Workshop on Machine Learning for Signal Processing, MLSP-2007 - Thessaloniki, Greece
    Duration: Aug 27 2007Aug 29 2007

    Other

    Other17th IEEE International Workshop on Machine Learning for Signal Processing, MLSP-2007
    CountryGreece
    CityThessaloniki
    Period8/27/078/29/07

    Fingerprint

    Object recognition
    Support vector machines
    Classifiers
    Experiments

    All Science Journal Classification (ASJC) codes

    • Computer Science(all)
    • Signal Processing

    Cite this

    Nag, A., Miller, D. J., Brown, A. P., & Sullivan, K. J. (2007). Combined generative-discriminative learning for object recognition using local image descriptors. In Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP (pp. 360-365). [4414333] https://doi.org/10.1109/MLSP.2007.4414333
    Nag, Ahhikesh ; Miller, David Jonathan ; Brown, Andrew P. ; Sullivan, Kevin J. / Combined generative-discriminative learning for object recognition using local image descriptors. Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP. 2007. pp. 360-365
    @inproceedings{705d58c6bdd84bc0acecb1c6377a52fe,
    title = "Combined generative-discriminative learning for object recognition using local image descriptors",
    abstract = "We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the exttacted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.",
    author = "Ahhikesh Nag and Miller, {David Jonathan} and Brown, {Andrew P.} and Sullivan, {Kevin J.}",
    year = "2007",
    month = "12",
    day = "1",
    doi = "10.1109/MLSP.2007.4414333",
    language = "English (US)",
    isbn = "1424415667",
    pages = "360--365",
    booktitle = "Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP",

    }

    Nag, A, Miller, DJ, Brown, AP & Sullivan, KJ 2007, Combined generative-discriminative learning for object recognition using local image descriptors. in Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP., 4414333, pp. 360-365, 17th IEEE International Workshop on Machine Learning for Signal Processing, MLSP-2007, Thessaloniki, Greece, 8/27/07. https://doi.org/10.1109/MLSP.2007.4414333

    Combined generative-discriminative learning for object recognition using local image descriptors. / Nag, Ahhikesh; Miller, David Jonathan; Brown, Andrew P.; Sullivan, Kevin J.

    Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP. 2007. p. 360-365 4414333.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - Combined generative-discriminative learning for object recognition using local image descriptors

    AU - Nag, Ahhikesh

    AU - Miller, David Jonathan

    AU - Brown, Andrew P.

    AU - Sullivan, Kevin J.

    PY - 2007/12/1

    Y1 - 2007/12/1

    N2 - We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the exttacted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

    AB - We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the exttacted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

    UR - http://www.scopus.com/inward/record.url?scp=48149102522&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=48149102522&partnerID=8YFLogxK

    U2 - 10.1109/MLSP.2007.4414333

    DO - 10.1109/MLSP.2007.4414333

    M3 - Conference contribution

    SN - 1424415667

    SN - 9781424415663

    SP - 360

    EP - 365

    BT - Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP

    ER -

    Nag A, Miller DJ, Brown AP, Sullivan KJ. Combined generative-discriminative learning for object recognition using local image descriptors. In Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP. 2007. p. 360-365. 4414333 https://doi.org/10.1109/MLSP.2007.4414333