Category Archives: Student Projects

Concluding DataFest 2024

DataFest 2024 Logo

The Center for Analytics and Data Science is happy to announce a close to this DataFest season. DataFest’24 was made possible by our sponsors Benchmark Gensuite and Fifth Third Banking.


Overall there were 80 participants from six different schools who competed this year. We would like to thank students for attending from the following schools:

  • Miami University
  • BGSU
  • College of Wooster
  • Xavier University
  • University of Cincinnati
  • Capital University

Winning Teams

Teams were ranked using a score based system. The winning teams were:

  • Top Score: Data-Holic Redhawks
  • Best Insight: Drake’s DataFesters
  • Best Visualization: The Derivative Falcons

DataFest@Miami ’24 Registration Opens

DataFest 2024 Logo
DataFest returns to Miami University from April 5th – April 7th for a data-packed 48 hours.

DataFest returns to Miami University from April 5th – April 7th for a data-packed 48 hours


DataFest, now in its eighth year at Miami, brings together teams of 3 – 5 analysis-minded undergraduates as they compete to extract a narrative from real-world datasets. These datasets are provided in cooperation with the American Statistical Assocation as part of the broader, international, event.

This year’s DataFest will find teams working in the new McVey Data Science Building, taking advantage of its numerous open-concept study spaces as they condense their insights into a short presentation. Along the way, students will have the opportunity to bounce ideas off a group of “roving consultants” – subject matter experts who volunteer their time so that students can leverage the benefit of real-world experience.

All of this leads to Sunday afternoon, when teams will showcase their understanding of the data by presenting to a group of expert judges. After deliberation, three teams will be chosen as winners across a variety of categories.

New this year, the Center for Analytics and Data Science will be hosting an information session on February 26th. Intended for students who have never participated in DataFest, we welcome any undergraduate student with questions about how this year’s competition might be different than years past.

Sports and Data Science: A Crossroads

To be completely honest, when I hear data science the first thing that comes to mind is complex data sets being manipulated by very smart people with very smart computers, which isn’t a bad thing! However, there are many other applications and uses for data science and analytics that attract different people with different interests. One of those applications that is particularly interesting to me and hopefully to you as well, is the use of analytics and data science in the sport industry. 

My name is Kade Peterson and I am a sophomore marketing major and sport management minor. I recently started working with CADS as a marketing intern this fall semester, to gain marketing experience and understand how Miami University uses data science around campus. I saw an opportunity to connect what I have been learning through CADS with my love of sports to investigate the use of data science in athletics. I had a conversation with my roommate, who works for the university’s baseball team as a student manager. He told me about the different data collection methods they have for the baseball team. As well as the processes they go through to analyze this data and present it to the rest of the organization to recommend action and show what is effective for the team. 

After having this discussion, I wanted to further investigate the use of data analytics in the sport industry in different areas of the market. I was drawn towards researching the use of sports analytics in golf because that is one of the sports I am most passionate about. One of the biggest stories of the year in the golfing community was the development of a much larger, much longer hitting Bryson DeChambeau. He has earned himself the nickname of The Mad Scientist, because of his devotion to tracking data and the mechanics of the body. Upon returning to competition after the COVID-19 pandemic, Bryson had added 30 pounds to his frame from daily workouts and strength training. He has become a much larger player which has resulted in him hitting the ball a LOT further. He is averaging 330 yards off the tee this season, which is up almost 40 yards from his previous seasons average. This massive increase has put him in first place for average driving distance on tour, nearly 10 yards ahead of his closest competitor (Source). 

Following the return to competition, there were lots of questions surrounding BeChambeau and if this change was worth it. He silenced all of those questions this year, placing in the top 10 in 9 of the 17 events he played. As well, he earned his first major title (a major is one of the 4 biggest tournaments each year U.S Open, The Open, The Masters, and PGA Championship) winning the U.S Open this year. Bryson DeChambeau is a breath of fresh air in a sport that has long been stuck in its ways and lacking diversity. He is a self-described nerd in the game of golf and has turned towards data in a game that is generally feel based. When promoting the launch of an app that helped players pick the best golf ball for themselves, Bryson said “ the data analytics aspect of golf has helped me understand, from a percentage standpoint, where to hit shots, how to play a course, what clubs to use based on conditions, etc…” (Source). Bryson has turned to data analytics to try and give himself an advantage over his competition, combined with his physical transformation he has become one of the most talked about players on tour. 

We are at a point where nearly every sport being played has some sort of data tracking and analytics aspect involved with it. The reach of data analytics in sports has increased over time. For example, The University of Connecticut just hosted a Sports Analytics Symposium with over 300 participants including students at the undergraduate and high school level. The purpose of this conference is to provide information to students who are just beginning to work with sports analytics or do not know much about it but know they are interested in sports and math. The symposium had four keynote speakers including Brian MacDonald the director of sports analytics at ESPN (Source). Universities are starting to offer more sports analytics opportunities to students. Miami University gives students the ability to gain a sports analytics certificate as well as a sports analytics minor. The crossroads of data analytics and sports has become more prominent in the sport industry and schools are moving to accommodate this industry shift.

In conclusion, the sport industry is usually not the first thing that comes to mind when discussing data analytics. However, it is a quickly growing segment in the sport industry that offers a unique use of it. This shows the wide spread of data analytics and how it impacts so many different industries in the world.

About the Author

Kade Peterson is a sophomore majoring in marketing with a minor in sport leadership and management. He also interns for the Center for Analytics and Data Science’s marketing team.

Insights from winning ASA’s DataExpo

When I was a prospective student touring colleges, I assumed research always involved test tubes and lab rats. However, as I reflect on the research project I completed this year, I am happy to say I did not spend any time in a lab. Instead, all I needed was my computer to participate in the DataExpo, a data science competition sponsored by the American Statistical Association (ASA).

Each year, students are given a government dataset and guiding questions, and then are expected to develop their own research project and present findings to judges at a statistics conference in the summer. This year, participants were asked to analyze data from the Global Historical Climatology Network, which contains weather records for the entire world since the 18th century. My project examining the relationship between public perception of climate change and county-level weather trends in the United States won first place in the competition. I learned a lot about data-driven research projects through the DataExpo and thought I would share some of my insights from the process.

  • Set boundaries to avoid becoming overwhelmed by an open-ended project.

My original idea for this project was to look at the social impact of climate change around the world. This is clearly very different from my final research question regarding public perception of climate change in the United States. Why did my research question change so drastically? It was important to narrow the scope of my analysis to something specific and manageable.

In order to analyze my original question, I would have needed to define social impact as it relates to climate change, found data that related to my definition of social impact, and then developed a method for quantifying social impact based on the data I had gathered. All of this would have been in addition to a similar process of quantifying climate change around the world with the GHCN data. As a full-time college student with about 6 months to complete the project, this was not feasible.

Rewriting my research question not only made my life easier, but it helped me create a better and more compelling story. For example, narrowing the scope of the analysis to the United States in the past 50 years was compelling from both a storytelling and data quality perspective. The United States was the best represented country in the dataset, and measurements are more accurate over the past 50 years as opposed to the past 100 years. I also knew the audience would almost entirely consist of people from the United States, and many middle-aged and older viewers would have been alive for most if not all of the period of analysis.

  • Recognize that performing analysis is a long and iterative process.

Even after I narrowed the scope of the analysis, the data processing for this project presented a significant challenge. I initially had 50 datasets (one per year), each with one row per weather station, per day. I needed to manipulate this data to get a new dataset with one row per county where the columns captured county-level temperature change over the past 50 years.  Recognizing there was a lot of work to be done, my advisor Dr. Tom Fisher and I broke the process down into a series of smaller steps. We transformed the data to get one row per station per year, and then to get one row per county per year. Finally, we measured the change in each county over time to obtain a dataset with one row per county.

A lot of this project involved thinking about the next step forward from where we were standing. Over the course of many months, we incrementally changed the dataset to obtain the final product. Some steps were repeated multiple times, and some steps were simplified or reduced to fit better with our end goal. For example, we originally processed the data for the last 120 years, and then only decided to use the last 50. Of all the summary statistics we calculated, only two of them turned out to be useful. I also reran the analysis to include precipitation measures late in the analysis process. It can be frustrating to feel like you’ve wasted your time on unused data processing or analysis, but it’s all part of the journey towards the final product.

  • Statistics is all about the story.

I think one of the most frustrating parts of any data project is that the parts of the project you dedicate the most time to are generally not what you share with the audience. Instead, the success of the project is judged based on the results of your project and how you deliver them.  After processing the climate data, I joined the final results to global warming survey data collected by Yale in 2019.  Using the full dataset, I built a series of maps and graphs to examine the relationship between observed climate change and public perception of climate change. Unfortunately, there wasn’t a strong correlation between the two sets of variables.

While my research question was answered, the end of the story just wasn’t satisfying. So rather than end the story there, I looked into several demographic factors to see if any of them were highly correlated with belief on climate change. Unsurprisingly, political ideology (represented by data from the 2016 presidential election) had the strongest correlation with belief on climate change. This extra step created a much better ending to my story. Rather than ending with a disappointing lack of correlation, I was able to construct a narrative about how Americans are more guided by political ideology and belief than empirical data, a suggestion that is especially relevant today in a world where public opinion doesn’t always align with scientific findings. 

Conclusion

I want to close with my biggest life lesson from this project, which is to always say yes to opportunities for practical experience in a field you’re passionate about. When I first joined the DataExpo team as an observer my sophomore year, I struggled to complete basic coding tasks. However, my experience shadowing that year not only led to my successes with this year’s project, but it also resulted in the opportunity to work more closely with members of the statistics faculty and students, in addition to getting me involved with CADS. I really appreciate Dr. Tom Fisher, Dr. Karsten Maurer, Matthew Snyder, Alison Tuiyott and Ben Schweitzer for making the DataExpo such a great experience. All of these opportunities have greatly improved my technical skills and prepared me for life after college. As I look back on my time at Miami, I will always be grateful for these experiences and for all of the people who helped me along the way.

About the Author

Lydia Carter is a senior at Miami University majoring in Statistics and Analytics. She has interned for CADS since Fall 2019.










Chemists and Analytics, A Surprising but Fruitful Partnership

My first experience as a CADS intern was standard to many. I worked with two other students and a faculty advisor on a project for a corporate client. The project followed a typical and expected process from introduction of the problem, lots and lots of industry research, applying analytical solutions to said problem, and then a final recommendation and presentation to the client. Once finished, I was excited and looking forward to a similar experience the following semester. However, before the semester ended, my team’s faculty advisor alluded that my skills may be put to the test next semester on a project with the chemistry department. Little did I know that this opportunity would teach me more about chemistry, analytics, data science, and the intersection of them, than I could have ever imagined.

For a little background, I am a current senior at Miami University where I am studying finance and business analytics. I was introduced to CADS and knew this was something I wanted to be involved with. It was an opportunity to use the skills and knowledge I had gained in the classroom, along with developing new ones, to fun and interesting projects. When one of my professors, Dr. Weese, mentioned she wanted me to be involved in a project for Miami University’s chemistry department, I was immediately intrigued. Never did I imagine I could apply my skills to a problem faced by my university’s chemists. That is, until our analytics team met with the chemistry team when we all realized the amount of untapped potential this partnership held. This partnership consisted of undergraduate students, graduate students, PhD candidates, professors, and even a department head from the Chemistry and Information Systems & Analytics departments at Miami University.

When you think about it, much of the typical chemist’s work is repetitive and manual. Compounds are researched, tested, and experimented with all by hand for the most part. Computers and robots can automate some of this if you have enough resources, but the point is that most every part of this process normally has to be done by hand, either a human’s or a robot’s. The advent of machine learning and artificial intelligence has already transformed many industries by eliminating, or at the minimum reducing, much of these tedious tasks. Thanks to Dr. Zishuo “Toby” Cheng and his curious mind, the question “Why can’t we apply machine learning to our beta lactamase inhibitor research?” was posed. What I loved most about this proposition is that nobody had tried anything exactly like it before. There was every reason for this partnership to work; we had the data, the smarts, and the desire, just nothing to go off of. However, this wasn’t an issue or disadvantage at all; instead, it forced us to think outside the box and think of every possible way to do something and see what worked and, many times, what didn’t. Not that having something to model after is ever bad, but it’s just human tendency to latch on to what was done before as the correct way. In our work, just about everything we did was “right”, only because there wasn’t anything to prove otherwise.

After several months being on the job, I think it is safe to say this partnership has been a huge success. By throwing numerous data science and analytical methods at the problem, we were able to dwindle down the search space of unknown compounds from over 70,000 to just 3,000. When you consider how in a normal situation every one of these 70,000 compounds would have to be tested, it becomes quickly clear how important this was feat was. No longer do you have to take a complete shot in the dark and hope you find a good compound; instead, you are able to look through only the compounds that have the highest probability of being successful per our models. Pending the results of the high throughput screening of these 3,000 compounds, we could eventually apply our analyses and models to a database of millions of unknown compounds.

It was in these times where the partnership really shined. As analytics students with no background in chemistry more advanced than high school chemistry, all of our results meant little to nothing to us. However, with our knowledge of what the numbers were showing and the chemists’ knowledge of what the numbers represented, we were able to uncover some incredible insights. For example, we strategically employed models with some form of interpretability that gave insight into what features of a compound make a good beta lactamase inhibitor. A couple of the most important variables made sense and were already well known as important features to the chemists. However, there were several features of good inhibitors according to our models that had never been considered before. The chemists determined these features still made logical sense, but simply were things not seen in past research. Although it isn’t the discovery of the next greatest beta lactamase inhibitor yet, it is insights like these that validate we are on the right track and give a glimpse in to the incredible potential for interdisciplinary teams like ours.

What’s next? For our team, we will continue to explore better methods for supporting the chemists’ research of beta lactamase inhibitors, hopefully leading to further insights into these important compounds. On a much larger scale, I hope to see many more partnerships like this one arise around Miami University. I can imagine successful partnerships with areas all over Miami. Thanks to CADS, these partnerships aren’t a matter of if they will ever happen, it’s simply a matter of when.

About the Author

Mitch Fairweather is a Miami University senior studying Finance and Business Analytics