{"id":78,"date":"2020-10-29T15:27:05","date_gmt":"2020-10-29T19:27:05","guid":{"rendered":"http:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/?p=78"},"modified":"2020-10-29T15:27:05","modified_gmt":"2020-10-29T19:27:05","slug":"insights-from-winning-asas-dataexpo","status":"publish","type":"post","link":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/2020\/10\/insights-from-winning-asas-dataexpo\/","title":{"rendered":"Insights from winning ASA&#8217;s DataExpo"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">When I was a prospective student touring colleges, I assumed\nresearch always involved test tubes and lab rats. However, as I reflect on the\nresearch project I completed this year, I am happy to say I did not spend any\ntime in a lab. Instead, all I needed was my computer to participate in the DataExpo,\na data science competition sponsored by the American Statistical Association\n(ASA).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Each year, students are given a government dataset and\nguiding questions, and then are expected to develop their own research project\nand present findings to judges at a statistics conference in the summer. This\nyear, participants were asked to analyze data from the Global Historical\nClimatology Network, which contains weather records for the entire world since the\n18<sup>th<\/sup> century. My project examining the relationship between public\nperception of climate change and county-level weather trends in the United\nStates won first place in the competition. I learned a lot about data-driven\nresearch projects through the DataExpo and thought I would share some of my\ninsights from the process.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Set boundaries to avoid becoming overwhelmed by an open-ended project.<\/strong><\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">My original idea for this project was to look at the social\nimpact of climate change around the world. This is clearly very different from\nmy final research question regarding public perception of climate change in the\nUnited States. Why did my research question change so drastically? It was\nimportant to narrow the scope of my analysis to something specific and\nmanageable. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In order to analyze my original question, I would have needed\nto define social impact as it relates to climate change, found data that related\nto my definition of social impact, and then developed a method for quantifying\nsocial impact based on the data I had gathered. All of this would have been in\naddition to a similar process of quantifying climate change around the world\nwith the GHCN data. As a full-time college student with about 6 months to\ncomplete the project, this was not feasible. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Rewriting my research question not only made my life\neasier, but it helped me create a better and more compelling story. For\nexample, narrowing the scope of the analysis to the United States in the past\n50 years was compelling from both a storytelling and data quality perspective.\nThe United States was the best represented country in the dataset, and\nmeasurements are more accurate over the past 50 years as opposed to the past\n100 years. I also knew the audience would almost entirely consist of people\nfrom the United States, and many middle-aged and older viewers would have been\nalive for most if not all of the period of analysis.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Recognize\nthat performing analysis is a long and iterative process.<\/strong><\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Even after I narrowed the scope of the analysis, the data\nprocessing for this project presented a significant challenge. I initially had\n50 datasets (one per year), each with one row per weather station, per day. I\nneeded to manipulate this data to get a new dataset with one row per county\nwhere the columns captured county-level temperature change over the past 50\nyears. &nbsp;Recognizing there was a lot of\nwork to be done, my advisor Dr. Tom Fisher and I broke the process down into a series\nof smaller steps. We transformed the data to get one row per station per year,\nand then to get one row per county per year. Finally, we measured the change in\neach county over time to obtain a dataset with one row per county.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A lot of this project involved thinking about the next step\nforward from where we were standing. Over the course of many months, we\nincrementally changed the dataset to obtain the final product. Some steps were\nrepeated multiple times, and some steps were simplified or reduced to fit\nbetter with our end goal. For example, we originally processed the data for the\nlast 120 years, and then only decided to use the last 50. Of all the summary\nstatistics we calculated, only two of them turned out to be useful. I also\nreran the analysis to include precipitation measures late in the analysis\nprocess. It can be frustrating to feel like you\u2019ve wasted your time on unused\ndata processing or analysis, but it\u2019s all part of the journey towards the final\nproduct.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Statistics\nis all about the story.<\/strong><\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">I think one of the most frustrating parts of any data\nproject is that the parts of the project you dedicate the most time to are\ngenerally not what you share with the audience. Instead, the success of the\nproject is judged based on the results of your project and how you deliver them.&nbsp; After processing the climate data, I joined\nthe final results to global warming survey data collected by Yale in 2019.&nbsp; Using the full dataset, I built a series of maps\nand graphs to examine the relationship between observed climate change and\npublic perception of climate change. Unfortunately, there wasn\u2019t a strong\ncorrelation between the two sets of variables. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While my research question was answered, the end of the\nstory just wasn\u2019t satisfying. So rather than end the story there, I looked into\nseveral demographic factors to see if any of them were highly correlated with\nbelief on climate change. Unsurprisingly, political ideology (represented by\ndata from the 2016 presidential election) had the strongest correlation with\nbelief on climate change. This extra step created a much better ending to my\nstory. Rather than ending with a disappointing lack of correlation, I was able\nto construct a narrative about how Americans are more guided by political\nideology and belief than empirical data, a suggestion that is especially\nrelevant today in a world where public opinion doesn\u2019t always align with\nscientific findings.&nbsp; <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Conclusion<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">I want to close with my biggest life lesson from this project, which is to always say yes to opportunities for practical experience in a field you\u2019re passionate about. When I first joined the DataExpo team as an observer my sophomore year, I struggled to complete basic coding tasks. However, my experience shadowing that year not only led to my successes with this year\u2019s project, but it also resulted in the opportunity to work more closely with members of the statistics faculty and students, in addition to getting me involved with CADS. I really appreciate Dr. Tom Fisher, Dr. Karsten Maurer, Matthew Snyder, Alison Tuiyott and Ben Schweitzer for making the DataExpo such a great experience. All of these opportunities have greatly improved my technical skills and prepared me for life after college. As I look back on my time at Miami, I will always be grateful for these experiences and for all of the people who helped me along the way. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">About the Author<\/h4>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"200\" height=\"200\" src=\"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/files\/2020\/10\/Lydia-Carter-photo.jpg\" alt=\"\" class=\"wp-image-80\" srcset=\"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/files\/2020\/10\/Lydia-Carter-photo.jpg 200w, https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/files\/2020\/10\/Lydia-Carter-photo-150x150.jpg 150w, https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/files\/2020\/10\/Lydia-Carter-photo-144x144.jpg 144w\" sizes=\"auto, (max-width: 200px) 100vw, 200px\" \/><figcaption> <a href=\"https:\/\/www.linkedin.com\/in\/lydia-carter-oh\/\">Lydia Carter<\/a> is a senior at Miami University majoring in Statistics and Analytics. She has interned for CADS since Fall 2019.<br> <br> <br> <br> <br> <br> <br> <br> <br><br><br><\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>When I was a prospective student touring colleges, I assumed research always involved test tubes and lab rats. However, as I reflect on the research project I completed this year, I am happy to say I did not spend any time in a lab. Instead, all I needed was my computer to participate in the [&hellip;]<\/p>\n","protected":false},"author":3098,"featured_media":79,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_s2mail":"","footnotes":""},"categories":[3],"tags":[],"class_list":["post-78","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-student-projects"],"_links":{"self":[{"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/posts\/78","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/users\/3098"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/comments?post=78"}],"version-history":[{"count":0,"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/posts\/78\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/media\/79"}],"wp:attachment":[{"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/media?parent=78"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/categories?post=78"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.miamioh.edu\/the-center-for-analytics-and-data-science\/wp-json\/wp\/v2\/tags?post=78"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}