Category Archives: Data Science in the Industry

Population Bias In ML Based Medical Research

Unequal outcomes in medical research has been an ongoing issue, but a new study indicates that machine learning may not be an automatic solution to this problem. (Conev, et al. 2024)

A team of researchers from Rice University in Houston, Texas have recently published a study examining how the utilization of a biased dataset within a machine learning model can result in a disparity of immunotherapy treatments across different income classifications and geographic populations.

In an analysis of available datasets the team found that these datasets were “biased toward the countries with higher income levels.” Several solutions are suggested, including a conscious effort to expand data collection to under-represented geographic populations as well as creating models that train on the characteristics of each individual patient.

Conev, A., Fasoulis, R., Hall-Swan, S., Ferreira, R., Kavraki, L. (2024) ‘HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors’, iScience, 27(1), https://www.sciencedirect.com/science/article/pii/S2589004223026901

Image of Python group in Excel Ribbon

Coming to an Excel Near You…

Python. Yes, the ever-popular Python programming language is set to be featured in Excel in the near future. Microsoft is partnering with Anaconda, a leading enterprise grade Python repository. Python in Excel leverages the Anaconda distribution for Python running in Azure, which includes the most popular Python libraries such as pandas for data manipulation, statsmodels for advanced statistical modeling, as well as Matplotlib and seaborn for data visualization. For now Python in Excel is only available in public preview for those in the Microsoft 365 Insiders program Beta Channel. However, this feature will roll out to Excel for Windows first, starting with build 16818, and then to the other platforms at a later date. But soon you will be able to do advanced data analysis in the familiar Excel environment by accessing Python directly from the Excel ribbon. No set-up or installation will be required. Read more about Python in Excel.

Can ChatGPT help us in Data Science?

By now you have probably heard about or tried ChatGPT.  ChatGPT is a Natural Language Processing “chatbot,” developed by OpenAI and released to the public in November of 2022 for testing. ChaptGPT is currently free to use on the web for anyone with an OpenAI account, which is currently also free. 

In a nutshell, you ask ChatGPT a question in a “conversational way” and it can use “the whole Internet” to help it provide a response – although its knowledge cutoff is currently 2021 (ask it about an event that occurred or a celebrity that passed away in the last year and you’ll see what I mean). The release of NLP chatbots like this is huge in a lot of ways, and scary in a lot of ways. Like… if AI like this can do your job in less time for less cost, is your job in danger? I’ve been hearing that one for years, and have somehow been able to stay under the radar.  😉

Anyway, how can ChatGPT be of use in analytics and data science? Well, let’s ask!

Q. How can ChatGPT help a data analyst?

A.  “ChatGPT can assist a data analyst in several ways, such as:

  1. Natural Language Processing (NLP) tasks, such as text summarization and sentiment analysis.
  2. Generating human-like text based on the data provided, such as automatically generating reports or descriptions of the data.
  3. Helping to clean and prepare data for analysis by identifying and extracting relevant information from unstructured text data.
  4. Automating repetitive tasks such as data entry or data cleaning.
  5. Answering questions about the data, by training on specific data sets, allowing the data analyst to query the model and receive insights on the data.

Overall, ChatGPT can save time and effort for a data analyst by automating certain tasks and providing insights and information quickly.

I input the question above into ChatGPT and the answer you see was provided in a matter of seconds – I didn’t write that answer. Interestingly enough, ChatGPT will not give you a duplicate answer if you input the same question twice. In fact, there is a “Regenerate Response” button at the bottom of the page which provides a new response in case you didn’t like the first one. Let’s try that…

A2.  “ChatGPT can assist a data analyst in a number of ways, including:

  1. Data cleaning and preprocessing: ChatGPT can be used to automate tasks such as removing outliers, filling in missing values, and transforming data into a format that is ready for analysis.
  2. Text summarization: ChatGPT can be used to summarize large amounts of text data, such as customer feedback or social media posts, making it easier for analysts to identify key trends and patterns.
  3. Natural language processing: ChatGPT can be used to extract insights from unstructured text data by identifying entities, sentiment, and topics.
  4. Generating reports and visualizations: ChatGPT can be used to generate automated reports and visualizations based on the data analyzed, making it easier for analysts to communicate their findings to stakeholders.
  5. Automating repetitive tasks: ChatGPT can automate repetitive tasks such as data entry and data validation, freeing up data analysts to focus on more complex and important tasks.

And you can ask ChatGPT to elaborate or give a more specific example of any part of the response you get. You can keep tweaking or rephrasing your question until you get the response that is most helpful or relevant. 

ChatGPT has a number of applications in writing code as well. You can ask ChatGPT to write the code needed to accomplish a task in virtually any programming language or even in common applications like Excel. For example, if you’re not familiar with all of the functions in Excel (and who is?), you can ask ChatGPT to write a formula or function needed to accomplish the task. You may need to input the coordinates (rows and columns) the formula or function should act upon and then you can likely just copy & paste the code into the formula bar or cells in Excel. Or if you are trying to automate a task, you can ask ChatGPT to write an Excel macro, then copy/paste the Visual Basic code into Excel – [Alt+F11] Win or [Opt+F11] Mac. There are also applications for learning to code: Let’s say you are reasonably fluent in R but are trying to learn Python. You can input some code in R and ask ChatGPT to give you the equivalent code in Python, or vice-versa. And ChatGPT may ask clarifying questions to help debug the code. And this is just the tip o’ the iceberg, as they say. There are limitations and thing to watch out for. You can find these and more information at the OpenAI > ChatGPT website. Very cool, try it while it’s still free!

Share the Love: Data Analytics, Coding, and Ohio Tech Day

True confession: I’m daunted by data analytics and coding. Well, I used to be. Maybe this is because I’ve always thought of myself as a language nerd. Among my circle of friends in college, I was the only one majoring in two languages—English and German. Actually, I started off pre-med, bouncing around from major to major until I finally committed to what I loved—and what I feared wouldn’t land me a job. But what I realized not too long after graduation is that my humanities education helped me focus on nuances, on the details that often go unnoticed, on the stories inherent in a seeming jumble of symbols on a page or screen.

Flash-forward several years to me as a grad student at Miami University. I was finally pursuing my other scholastic love duo—design and tech. Walking one day to King Library, I saw a sandwich board advertising a new Data Analytics Awareness Microcredential from the Center for Analytics & Data Science (CADS). Free to current students! And starting this week! Was this for me? I liked the “awareness” part. Yeah, I got this. I eagerly snapped a pic of the ad and signed up as soon as I got home.

Flash-forward again a few years, now to me as marketing director at Joot, a local tech startup that I was fortunate to discover at Miami’s Architecture, Design & Emerging Technology Fair. I’m lucky to have landed here, having walked a rather winding path to this exciting financial-services-plus-tech destination. One of the first opportunities I learned about after joining Joot is our partnership with OhioX, a nonprofit helping people from all walks of life participate in and contribute to the vast world of tech and innovation, right here in Ohio. Working on a few OhioX committees introduced me to Ohio Tech Day, a free event many months in the making and finally kicking off on September 24, 2021.

Here’s where things come full circle. As a self-proclaimed language nerd, I was given the chance, thanks to CADS, to get my Data Analytics Awareness Microcredential, which demystified data analytics and coding to the point where I could actually do this stuff—and even apply these skills in my daily work. I had to find a way to give back. So I partnered with Phillip DiSilvestri, Joot’s incredibly talented sales and marketing intern (and current Miami student), to develop a coding tutorial that empowers students to learn the fundamentals of data analytics and coding in order to find relevant scholarship opportunities. This tutorial replicates, on a small scale, the kind of engagement and access I enjoyed through the CADS microcredential program.

But the story’s not over yet. Our coding tutorial, which uses data from DataOhio, sparked several conversations with companies and schools interested in making the tutorial even more engaging and accessible to more people. With the tremendous help of Dolr, we’re expanding the tutorial into more programming languages—including Python, which the CADS microcredential introduced me to—and giving students more opportunities to share their work.

Our scholarship map coding tutorial is living proof that anyone can do data analytics and coding. And anyone can apply these highly valuable skills to do good—for ourselves and others who maybe, at first, didn’t think we could.

About the Author

Karen Mauk graduated from Miami University’s Graduate School with an MFA in Experience Design (2021) and a graduate certificate in Interactive Media Studies (2019). She also participated in Miami’s miniMBA (2021) and Data Analytics Awareness Microcredential (2020) programs.

A Couple of Notes on Learning Sport Analytics

It is more than obvious that sport analytics is booming thanks to the technology breakthrough, algorithm advancement, and upgraded analytical tools.

If you are a fan of sport analytics… this thought probably has emerged in your mind: how can I be able to perform analytics on sport data?

Let me start with some analytical tools. Either R or Python would be great. These open-resource tools have a wide spectrum of packages allowing you to address various sport analytics issues from fundamental data wrangling to cutting-edge deep learning. As time goes by, more diverse and powerful packages would be developed or updated, so you don’t need to keep learning new tools to address different aspects of sport analytics. As a follow-up question: which one should I eventually choose, R or Python? Here is some information that may help you make a final selection. R comparatively has more well-developed packages in data analytics and has a more convenient IDE (i.e., RStudio) than Python IDEs (e.g., Spyder or PyCharm). But if you’re also interested in other programming fields like web design and app development in addition to sport analytics, Python is a better option given its generic nature. Please note this is status quo. Both R and Python have been updating themselves. For example, in recent years, R added more packages in deep learning and text mining; meanwhile, the data analytics arena of Python also got richer and richer.

A main point distinguishing sport analytics from general data analytics is its contextualized attributes, such as analytics purposes, data features, analytics criteria, and result interpretations. To gain a better understanding on these contextulized attributes, you need to know your target sport program(s) in depth. So where to start to gain these insights? Observing training and games, communicating with coaches and players, learning from experienced analysts in that specific field are highly valuable. As a Miamian, working for varsity teams on campus would be a great start. Don’t forget to educate yourself via industry conferences and theoretical courses such as sport economics, sport psychology, athletic training, coaching, etc.

Having systematic trainings is crucial for a successful run in sport analytics. Oftentimes, un-rigorous analytics are worse than no analytics, which is largely amplified in the field of sport analytics given its high stake. Therefore, going through systematic trainings in algorithm, probability, measurement, and algebra are necessary. You should also pay attention to the technology sector which has been reshaping the landscape of sports analytics. For example, movement tracking systems (e.g., SportUV, Rapsodo, and K-motion) and health mentoring equipment have elevated sport performance analytics to another level.

Lastly, I would like say that conducting sport analytics professionally is a niche field, whereas sport analytics education is for all practitioners, as well as for those working in related industries. No matter if you are inspired to work in sport marketing, management, media, finance, or coaching, you should be equipped with fundamental sports analytics knowledge which enables you to digest analytics reports, generate scientific insights, communicate with analytics departments/groups, and eventually come up with sound data-driven decisions.

About the Author

Dr. Jerred Junqi Wang is an Assistant Professor at Miami University in the Sport Leadership and Management Department.

Lessons from Heathrow Airport’s Data-Informed Customer Service Strategy

A while ago, before international travel was restricted due to the global pandemic, I made my way to London Heathrow International Airport to catch a flight back home after a business trip. To cater for what I expected was going to be a long wait at the security check, I arrived at the airport well ahead of the recommended time. Heathrow Airport, after all, is reputed to be one of the world’s busiest airports in terms of passenger traffic. I wasn’t taking any chances.

Instead of the dread I expected, I was pleasantly surprised. The process was quick and smooth. It took much less time than I had anticipated. To boot, the security officers were very courteous and helpful. I wondered if I was just lucky to have arrived at a slow time.

After some inquiry, I concluded that my experience could most likely be attributed to a data-informed customer feedback system that Heathrow airport had implemented to improve service delivery. Let me explain.

The Smiley Face Terminal

While en route to the departure lounge to wait for my flight, I absentmindedly pressed one of the smiley face buttons on a terminal indicating how I felt to the question, “How was your security experience today?” Much like one would do on a social media post. I didn’t think much of it until I arrived at the lounge and looked up at the large electronic panels to locate my departure gate.

Next to lists of departing flights, each one of the electronic panels had a dashboard with bar graphs showing levels of customer satisfaction with the airport’s performance on a number of service performance targets. The bar graphs showed how each of the five terminals in Heathrow had performed in service level performance targets, such as security wait times of less than five minutes.

It was fascinating to see that all the terminals had achieved their targets 98% of the time over the month. For example, one bar graph showed that a terminal had scored 4.21 and above out of a possible high score of 5 in customer satisfaction level. I had a light bulb moment when I realized that the smiley face terminal I had just encountered was a tool for collecting feedback from travelers about their experience.

Immediately, I applied my investigative skills to find out how Heathrow came up with this tool and how it was implemented.

The HappyOrNot Innovation

An internet search of “smiley face terminal” brought up articles in the New Yorker and the Telegraph describing this innovation. I discovered that it was developed by a small Finnish startup led by a programmer called Heikki Väänänen. As the story goes, Heikki liked to give feedback to establishments he patronized, but filling out surveys was inconvenient. Apparently, he is not alone.

Heikki saw this as a business opportunity and developed the so-called “HappyOrNot” terminals. The terminals have buttons with smiley faces (made ubiquitous by social media), ranging from very happy to dissatisfied, for customers to register their level of satisfaction. They are simple to use and send data directly to people who are interested in the results.

Heathrow Airport implemented the HappyOrNot terminals to improve customer satisfaction levels ahead of the expected influx of international visitors for the 2012 Summer Olympics. They positioned HappyOrNot terminals so that passengers could use them as they cleared security. Airport executives were able identify problem locations in real time, and address the issues quickly. At the time, Heathrow security’s over-all passenger-satisfaction scores are reported to have risen by more than half.

Contrast this with the Transportation Security Administration (T.S.A.) feedback system that the New Yorker writer encountered at Palm Beach International Airport. Customers were invited to scan a QR code to inform T.S.A. of their experience. After scanning the QR code, the writer ended up with a link to T.S.A.’s general customer-service web page.

The writer then had to search for a form which required him to provide his full name and email address to create an account so he could then log in to provide feedback! Hardly the way to encourage customer feedback!

Data-Informed Decision Making

Organizational and business leaders all over the world are increasingly applying innovative ways to gather and analyze customer feedback to gain insights to improve their systems.   Management scholars have called this concept organizational learning. The term “learning organization” was made popular by Peter Senge in his book The Fifth Discipline.

One characteristic of learning organizations is that they need to be able to detect and correct errors. To do this, an organization needs to be able to seek out and seriously consider feedback about the work employees are engaged in, the wider environment, and its performance.

In the past, organizations collected and then analyzed data “off-line” in order to determine corrective actions to take. Modern technology, however, has now made it possible to learn at the very moment of action.

Heathrow Airport, for example, has a key performance index of getting travelers through security in 5 minutes or less. HappyOrNot terminals have proven to be an effective tool for the officials to collect real-time data from customers on how they are performing on this service target. Actionable insights enable airport officials to pinpoint problem areas and attend to them immediately, thus increasing their competitiveness as an air transportation hub.

Data from customer feedback is not only useful in the scenario just described involving single-loop feedback – a response that tells you you need to adjust how you’re acting. Insights from data can also generate double-loop feedback –  a response that tells you that your approach or strategy needs to change. Data are not only useful in detecting discontinuities between strategy and performance in the present, but also among possible visions, strategies, and specific goals for the future that an organization has.

Organizations that are going to be competitive in the fast-paced, interconnected and globalized world where large amounts of data and automation are the norm, are going to be organizations that are quick to learn and adapt at every level of the organization. Data-informed decision making is crucial for individuals, teams, organizations and larger institutions to become more capable of self-transformation with the result that they are more creative, more aware, more just, and more sustainable.

About the Author

Peter Maribei, PhD is the Associate Director of Education Abroad at Miami University – Oxford. He manages faculty led study away and study abroad programs. Over the course of his career, he has developed international internship and exchange programs and taught students from the Americas, Europe, and Africa. His scholarly interests lie in experiential learning strategies that contribute towards the development of moral imagination, intercultural competence and socially responsible leadership