Inferring censored geo-information with non-representative data

Yu Zhang, Tse Chuan Yang, Stephen Augustus Matthews

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-information (e.g., tract), rather than detailed geo-information (e.g., street, apt number) when sharing surveyed data. Without the exact locations, many point-based analyses cannot be performed. While several scholars have developed new methods to address this issue, little attention has been paid to how to correct this issue when data are not representative. To fill this knowledge gap, we propose a bias correction method that adjusts for the bias using a bias factor approach. Applying our method to an empirical data set with a known bias associated with gender, we found that our method could generate reliable results despite the non-representativeness of the sample.

Original languageEnglish (US)
Title of host publicationMachine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings
EditorsPetra Perner
PublisherSpringer Verlag
Pages229-235
Number of pages7
ISBN (Print)9783319419190
DOIs
StatePublished - Jan 1 2016
Event12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016 - New York, United States
Duration: Jul 16 2016Jul 21 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9729
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016
CountryUnited States
CityNew York
Period7/16/167/21/16

Fingerprint

Bias Correction
Data Sharing
Privacy
Knowledge
Gender

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhang, Y., Yang, T. C., & Matthews, S. A. (2016). Inferring censored geo-information with non-representative data. In P. Perner (Ed.), Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings (pp. 229-235). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9729). Springer Verlag. https://doi.org/10.1007/978-3-319-41920-6_17
Zhang, Yu ; Yang, Tse Chuan ; Matthews, Stephen Augustus. / Inferring censored geo-information with non-representative data. Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. editor / Petra Perner. Springer Verlag, 2016. pp. 229-235 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{677de846857146c5b1f60c75104a920c,
title = "Inferring censored geo-information with non-representative data",
abstract = "The goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-information (e.g., tract), rather than detailed geo-information (e.g., street, apt number) when sharing surveyed data. Without the exact locations, many point-based analyses cannot be performed. While several scholars have developed new methods to address this issue, little attention has been paid to how to correct this issue when data are not representative. To fill this knowledge gap, we propose a bias correction method that adjusts for the bias using a bias factor approach. Applying our method to an empirical data set with a known bias associated with gender, we found that our method could generate reliable results despite the non-representativeness of the sample.",
author = "Yu Zhang and Yang, {Tse Chuan} and Matthews, {Stephen Augustus}",
year = "2016",
month = "1",
day = "1",
doi = "10.1007/978-3-319-41920-6_17",
language = "English (US)",
isbn = "9783319419190",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "229--235",
editor = "Petra Perner",
booktitle = "Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings",
address = "Germany",

}

Zhang, Y, Yang, TC & Matthews, SA 2016, Inferring censored geo-information with non-representative data. in P Perner (ed.), Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9729, Springer Verlag, pp. 229-235, 12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016, New York, United States, 7/16/16. https://doi.org/10.1007/978-3-319-41920-6_17

Inferring censored geo-information with non-representative data. / Zhang, Yu; Yang, Tse Chuan; Matthews, Stephen Augustus.

Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. ed. / Petra Perner. Springer Verlag, 2016. p. 229-235 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9729).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Inferring censored geo-information with non-representative data

AU - Zhang, Yu

AU - Yang, Tse Chuan

AU - Matthews, Stephen Augustus

PY - 2016/1/1

Y1 - 2016/1/1

N2 - The goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-information (e.g., tract), rather than detailed geo-information (e.g., street, apt number) when sharing surveyed data. Without the exact locations, many point-based analyses cannot be performed. While several scholars have developed new methods to address this issue, little attention has been paid to how to correct this issue when data are not representative. To fill this knowledge gap, we propose a bias correction method that adjusts for the bias using a bias factor approach. Applying our method to an empirical data set with a known bias associated with gender, we found that our method could generate reliable results despite the non-representativeness of the sample.

AB - The goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-information (e.g., tract), rather than detailed geo-information (e.g., street, apt number) when sharing surveyed data. Without the exact locations, many point-based analyses cannot be performed. While several scholars have developed new methods to address this issue, little attention has been paid to how to correct this issue when data are not representative. To fill this knowledge gap, we propose a bias correction method that adjusts for the bias using a bias factor approach. Applying our method to an empirical data set with a known bias associated with gender, we found that our method could generate reliable results despite the non-representativeness of the sample.

UR - http://www.scopus.com/inward/record.url?scp=84979057671&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979057671&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-41920-6_17

DO - 10.1007/978-3-319-41920-6_17

M3 - Conference contribution

SN - 9783319419190

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 229

EP - 235

BT - Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings

A2 - Perner, Petra

PB - Springer Verlag

ER -

Zhang Y, Yang TC, Matthews SA. Inferring censored geo-information with non-representative data. In Perner P, editor, Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. Springer Verlag. 2016. p. 229-235. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-41920-6_17