Software Engineering Needs Philosophy Majors

Artificial intelligence is expected to take over coding. This means that engineers with expertise in English, psychology and philosophy are highly valued for their critical thinking and communication skills. (Cantor 2024)

In the Harvard Business Review, Marco Argenti, Chief Information Officer at Goldman Sachs, made this argument. Argenti told his daughter that if she wanted to be an engineer, she needed to focus on learning philosophy. He wrote, “Coming from an engineer, that might seem counterintuitive, but the ability to develop crisp mental models around the problems you want to solve and understanding the why before you start working on the how is an increasingly critical skillset, especially in the age of AI.”

AI will start taking over the brunt work of coding and software engineers will need to be able to prove that they are still needed. Knowing how to ask the right questions and think outside the box will be important skills for the upcoming generation of software engineers. Creativity is one of the most important skills people can have.

Cantor, Melissa. ‘Goldman Wants Philosophy Majors.’, LinkedIn News, 22 Apr. 2024, http://www.linkedin.com/news/story/goldman-wants-philosophy-majors-6692170/.

Concluding DataFest 2024

The Center for Analytics and Data Science is happy to announce a close to this DataFest season. DataFest’24 was made possible by our sponsors Benchmark Gensuite and Fifth Third Banking.

Overall there were 80 participants from six different schools who competed this year. We would like to thank students for attending from the following schools:

Miami University
BGSU
College of Wooster

Xavier University
University of Cincinnati
Capital University

Winning Teams

Teams were ranked using a score based system. The winning teams were:

Top Score: Data-Holic Redhawks
Best Insight: Drake’s DataFesters
Best Visualization: The Derivative Falcons

Data Visualization (Without Coding)

This is the second bootcamp in The Center for Analytics and Data Science Faculty Fellow Bootcamp Series

Shenyue Jia

Join Shenyue Jia, Center for Analytics and Data Science Faculty Fellow and assistant professor of Geography, as she leads an activity-based bootcamp about data visualization in various platforms and the usage of GitHub for non-coders.

Over a three day bootcamp about data visualization and GitHub, Jia will help the audience take their knowledge of data visualization and apply it in multiple platforms, such as: Excel, Google Sheets, DataWrapper, and Tableau. She will teach the audience how to use GitHub for maintaining a project portfolio.

4/18: Data visualization in Excel, Google Sheets, and DataWrapper. Part one of a GitHub mini-lesson for non-coders
4/25: Part two of a GitHub mini-lesson for non-coders. Beginners Guide for Tableau.
5/2: Beginners Guide for Tableau continues. Uploading bootcamp project to GitHub.

This entry in the CADS Faculty Fellow Bootcamp Series presumes basic knowledge of data visualization. Due to the interactive elements of this bootcamp, please bring your laptop.

Bootcamp horizontal flyer that displays the name of the workshop

The Center for Analytics and Data Science is proud to be able to bring unique views into the arena of data science through its Faculty Fellow program. Thanks to the wide variety of talent offered by these gifted academics, CADS is able to provide examples of data science principles as they apply to the research of an array of disciplines. We thank all of our Faculty Fellows for their hard work and willingness to share.

If you have a topic that you would like to see covered as part of the Faculty Fellows Bootcamp Series, or any other question please contact the Center for Analytics and Data Science at cads@miamioh.edu

Population Bias In ML Based Medical Research

Unequal outcomes in medical research has been an ongoing issue, but a new study indicates that machine learning may not be an automatic solution to this problem. (Conev, et al. 2024)

A team of researchers from Rice University in Houston, Texas have recently published a study examining how the utilization of a biased dataset within a machine learning model can result in a disparity of immunotherapy treatments across different income classifications and geographic populations.

In an analysis of available datasets the team found that these datasets were “biased toward the countries with higher income levels.” Several solutions are suggested, including a conscious effort to expand data collection to under-represented geographic populations as well as creating models that train on the characteristics of each individual patient.

Conev, A., Fasoulis, R., Hall-Swan, S., Ferreira, R., Kavraki, L. (2024) ‘HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors’, iScience, 27(1), https://www.sciencedirect.com/science/article/pii/S2589004223026901

DataFest@Miami ’24 Registration Opens

DataFest returns to Miami University from April 5th – April 7th for a data-packed 48 hours.

DataFest returns to Miami University from April 5th – April 7th for a data-packed 48 hours

DataFest, now in its eighth year at Miami, brings together teams of 3 – 5 analysis-minded undergraduates as they compete to extract a narrative from real-world datasets. These datasets are provided in cooperation with the American Statistical Assocation as part of the broader, international, event.

This year’s DataFest will find teams working in the new McVey Data Science Building, taking advantage of its numerous open-concept study spaces as they condense their insights into a short presentation. Along the way, students will have the opportunity to bounce ideas off a group of “roving consultants” – subject matter experts who volunteer their time so that students can leverage the benefit of real-world experience.

All of this leads to Sunday afternoon, when teams will showcase their understanding of the data by presenting to a group of expert judges. After deliberation, three teams will be chosen as winners across a variety of categories.

New this year, the Center for Analytics and Data Science will be hosting an information session on February 26th. Intended for students who have never participated in DataFest, we welcome any undergraduate student with questions about how this year’s competition might be different than years past.

Social Network Analysis in R

The Center for Analytics and Data Science Faculty Fellow Bootcamp Series is kicking off for Spring Semester.

Join Kevin Reuning, Center for Analytics and Data Science Faculty Fellow and assistant professor in Political Science, as he leads an activity-driven exploration of the (sometimes hidden) connections that link us as a society.

Over three consecutive Wednesday evenings in McVey 168, Reuning will help the audience take their knowledge of R and data analysis as it pertains to more traditional data sets and apply it to the interconnected web that is the foundation of a social networking modelling.

March 6th: Introduction to network terminology and data
March 13th: Visualizing networks
March 20th: Calculating basic network statistics

This entry in the CADS Faculty Fellow Boot Camp Series presumes at least some level of R proficiency and working knowledge of basic data analysis principles. Due to the interactive nature of the exploration, please bring your laptop.

Coming to an Excel Near You…

Python. Yes, the ever-popular Python programming language is set to be featured in Excel in the near future. Microsoft is partnering with Anaconda, a leading enterprise grade Python repository. Python in Excel leverages the Anaconda distribution for Python running in Azure, which includes the most popular Python libraries such as pandas for data manipulation, statsmodels for advanced statistical modeling, as well as Matplotlib and seaborn for data visualization. For now Python in Excel is only available in public preview for those in the Microsoft 365 Insiders program Beta Channel. However, this feature will roll out to Excel for Windows first, starting with build 16818, and then to the other platforms at a later date. But soon you will be able to do advanced data analysis in the familiar Excel environment by accessing Python directly from the Excel ribbon. No set-up or installation will be required. Read more about Python in Excel.

Can ChatGPT help us in Data Science?

By now you have probably heard about or tried ChatGPT. ChatGPT is a Natural Language Processing “chatbot,” developed by OpenAI and released to the public in November of 2022 for testing. ChaptGPT is currently free to use on the web for anyone with an OpenAI account, which is currently also free.

In a nutshell, you ask ChatGPT a question in a “conversational way” and it can use “the whole Internet” to help it provide a response – although its knowledge cutoff is currently 2021 (ask it about an event that occurred or a celebrity that passed away in the last year and you’ll see what I mean). The release of NLP chatbots like this is huge in a lot of ways, and scary in a lot of ways. Like… if AI like this can do your job in less time for less cost, is your job in danger? I’ve been hearing that one for years, and have somehow been able to stay under the radar. 😉

Anyway, how can ChatGPT be of use in analytics and data science? Well, let’s ask!

Q. How can ChatGPT help a data analyst?

A. “ChatGPT can assist a data analyst in several ways, such as:

Natural Language Processing (NLP) tasks, such as text summarization and sentiment analysis.
Generating human-like text based on the data provided, such as automatically generating reports or descriptions of the data.
Helping to clean and prepare data for analysis by identifying and extracting relevant information from unstructured text data.
Automating repetitive tasks such as data entry or data cleaning.
Answering questions about the data, by training on specific data sets, allowing the data analyst to query the model and receive insights on the data.

Overall, ChatGPT can save time and effort for a data analyst by automating certain tasks and providing insights and information quickly.“

I input the question above into ChatGPT and the answer you see was provided in a matter of seconds – I didn’t write that answer. Interestingly enough, ChatGPT will not give you a duplicate answer if you input the same question twice. In fact, there is a “Regenerate Response” button at the bottom of the page which provides a new response in case you didn’t like the first one. Let’s try that…

A2. “ChatGPT can assist a data analyst in a number of ways, including:

Data cleaning and preprocessing: ChatGPT can be used to automate tasks such as removing outliers, filling in missing values, and transforming data into a format that is ready for analysis.
Text summarization: ChatGPT can be used to summarize large amounts of text data, such as customer feedback or social media posts, making it easier for analysts to identify key trends and patterns.
Natural language processing: ChatGPT can be used to extract insights from unstructured text data by identifying entities, sentiment, and topics.
Generating reports and visualizations: ChatGPT can be used to generate automated reports and visualizations based on the data analyzed, making it easier for analysts to communicate their findings to stakeholders.
Automating repetitive tasks: ChatGPT can automate repetitive tasks such as data entry and data validation, freeing up data analysts to focus on more complex and important tasks.“

And you can ask ChatGPT to elaborate or give a more specific example of any part of the response you get. You can keep tweaking or rephrasing your question until you get the response that is most helpful or relevant.

ChatGPT has a number of applications in writing code as well. You can ask ChatGPT to write the code needed to accomplish a task in virtually any programming language or even in common applications like Excel. For example, if you’re not familiar with all of the functions in Excel (and who is?), you can ask ChatGPT to write a formula or function needed to accomplish the task. You may need to input the coordinates (rows and columns) the formula or function should act upon and then you can likely just copy & paste the code into the formula bar or cells in Excel. Or if you are trying to automate a task, you can ask ChatGPT to write an Excel macro, then copy/paste the Visual Basic code into Excel – [Alt+F11] Win or [Opt+F11] Mac. There are also applications for learning to code: Let’s say you are reasonably fluent in R but are trying to learn Python. You can input some code in R and ask ChatGPT to give you the equivalent code in Python, or vice-versa. And ChatGPT may ask clarifying questions to help debug the code. And this is just the tip o’ the iceberg, as they say. There are limitations and thing to watch out for. You can find these and more information at the OpenAI > ChatGPT website. Very cool, try it while it’s still free!

Share the Love: Data Analytics, Coding, and Ohio Tech Day

True confession: I’m daunted by data analytics and coding. Well, I used to be. Maybe this is because I’ve always thought of myself as a language nerd. Among my circle of friends in college, I was the only one majoring in two languages—English and German. Actually, I started off pre-med, bouncing around from major to major until I finally committed to what I loved—and what I feared wouldn’t land me a job. But what I realized not too long after graduation is that my humanities education helped me focus on nuances, on the details that often go unnoticed, on the stories inherent in a seeming jumble of symbols on a page or screen.

Flash-forward several years to me as a grad student at Miami University. I was finally pursuing my other scholastic love duo—design and tech. Walking one day to King Library, I saw a sandwich board advertising a new Data Analytics Awareness Microcredential from the Center for Analytics & Data Science (CADS). Free to current students! And starting this week! Was this for me? I liked the “awareness” part. Yeah, I got this. I eagerly snapped a pic of the ad and signed up as soon as I got home.

Flash-forward again a few years, now to me as marketing director at Joot, a local tech startup that I was fortunate to discover at Miami’s Architecture, Design & Emerging Technology Fair. I’m lucky to have landed here, having walked a rather winding path to this exciting financial-services-plus-tech destination. One of the first opportunities I learned about after joining Joot is our partnership with OhioX, a nonprofit helping people from all walks of life participate in and contribute to the vast world of tech and innovation, right here in Ohio. Working on a few OhioX committees introduced me to Ohio Tech Day, a free event many months in the making and finally kicking off on September 24, 2021.

Here’s where things come full circle. As a self-proclaimed language nerd, I was given the chance, thanks to CADS, to get my Data Analytics Awareness Microcredential, which demystified data analytics and coding to the point where I could actually do this stuff—and even apply these skills in my daily work. I had to find a way to give back. So I partnered with Phillip DiSilvestri, Joot’s incredibly talented sales and marketing intern (and current Miami student), to develop a coding tutorial that empowers students to learn the fundamentals of data analytics and coding in order to find relevant scholarship opportunities. This tutorial replicates, on a small scale, the kind of engagement and access I enjoyed through the CADS microcredential program.

But the story’s not over yet. Our coding tutorial, which uses data from DataOhio, sparked several conversations with companies and schools interested in making the tutorial even more engaging and accessible to more people. With the tremendous help of Dolr, we’re expanding the tutorial into more programming languages—including Python, which the CADS microcredential introduced me to—and giving students more opportunities to share their work.

Our scholarship map coding tutorial is living proof that anyone can do data analytics and coding. And anyone can apply these highly valuable skills to do good—for ourselves and others who maybe, at first, didn’t think we could.

About the Author

Karen Mauk graduated from Miami University’s Graduate School with an MFA in Experience Design (2021) and a graduate certificate in Interactive Media Studies (2019). She also participated in Miami’s miniMBA (2021) and Data Analytics Awareness Microcredential (2020) programs.

A Couple of Notes on Learning Sport Analytics

It is more than obvious that sport analytics is booming thanks to the technology breakthrough, algorithm advancement, and upgraded analytical tools.

If you are a fan of sport analytics… this thought probably has emerged in your mind: how can I be able to perform analytics on sport data?

Let me start with some analytical tools. Either R or Python would be great. These open-resource tools have a wide spectrum of packages allowing you to address various sport analytics issues from fundamental data wrangling to cutting-edge deep learning. As time goes by, more diverse and powerful packages would be developed or updated, so you don’t need to keep learning new tools to address different aspects of sport analytics. As a follow-up question: which one should I eventually choose, R or Python? Here is some information that may help you make a final selection. R comparatively has more well-developed packages in data analytics and has a more convenient IDE (i.e., RStudio) than Python IDEs (e.g., Spyder or PyCharm). But if you’re also interested in other programming fields like web design and app development in addition to sport analytics, Python is a better option given its generic nature. Please note this is status quo. Both R and Python have been updating themselves. For example, in recent years, R added more packages in deep learning and text mining; meanwhile, the data analytics arena of Python also got richer and richer.

A main point distinguishing sport analytics from general data analytics is its contextualized attributes, such as analytics purposes, data features, analytics criteria, and result interpretations. To gain a better understanding on these contextulized attributes, you need to know your target sport program(s) in depth. So where to start to gain these insights? Observing training and games, communicating with coaches and players, learning from experienced analysts in that specific field are highly valuable. As a Miamian, working for varsity teams on campus would be a great start. Don’t forget to educate yourself via industry conferences and theoretical courses such as sport economics, sport psychology, athletic training, coaching, etc.

Having systematic trainings is crucial for a successful run in sport analytics. Oftentimes, un-rigorous analytics are worse than no analytics, which is largely amplified in the field of sport analytics given its high stake. Therefore, going through systematic trainings in algorithm, probability, measurement, and algebra are necessary. You should also pay attention to the technology sector which has been reshaping the landscape of sports analytics. For example, movement tracking systems (e.g., SportUV, Rapsodo, and K-motion) and health mentoring equipment have elevated sport performance analytics to another level.

Lastly, I would like say that conducting sport analytics professionally is a niche field, whereas sport analytics education is for all practitioners, as well as for those working in related industries. No matter if you are inspired to work in sport marketing, management, media, finance, or coaching, you should be equipped with fundamental sports analytics knowledge which enables you to digest analytics reports, generate scientific insights, communicate with analytics departments/groups, and eventually come up with sound data-driven decisions.