This proposal is to analyze economics of VAST Challenge 2022 (challenge 3), including the business growth prospect, financial health of residents, and employment, with visual analytics .
The fictious town of Engagement, Ohio is marketed as the future of the USA. In keeping with this, the city is studying opportunities to ensure competitiveness in the future.
We have a sample of 1000 volunteers who provided data which records the places they visit, their spending, and their purchases, among a host of other information using the city’s urban planning app.
We aim to come up with a with insights via a platform that will be able to be easily accessed and modified by an end user. It also has to be able to wrangle large amounts of data and smooth. To do this we will use open-source and freely available- R. We will be leveraging to Shiny, ggplot2 and other relevant R packages to create an interactive website to visualize business prosperity, financial health of the residents and health of employers.
Given that there is no true town of Engagement, Ohio, we will look at literature available on Ohio as a state as a reference.
Ohio had a gross state product of 615.6bn with a growth of 1.5% over 5 years, this is the 30th out of the 50 US states. The economy of Ohio (based on 2018 data) is comprised primarily of Manufacturing (18.5%), Real estate (12.2%), Financial service (10.3%), Healthcare (9.8%) (IBISWorld, 2022). Up and coming sectors are materials extraction (25.7% growth), Finance (8%), Logistics (7.6%) and Construction (6.7%) (IBISWorld, 2022). So key opportunities on a state level would lie in financial services if looking at a large industry with great growth potential. We will look to augment this view with our study.
Shiny1 is an open R package, free of charge, for building an interactive modern web apps from R. It can be either standalone app or embedded in R Markdown documents. According to an article shared by Debolina Biswas2, compared to other visualisation tool, such as Tableau, Shiny can connect with any data source, offers powerful data manipulation or wrangling and analysis tools, as well as statistical modelling and advanced forecasting packages. However, the user must know coding which may be challenging sometimes for non-coders.
Data visualisation3 represents information and data with visual elements like charts, graphs, and maps, so that users can easily see and understand trends, outliers, and patterns in data. There are many different visualisation techniques4, for example, a timeline chart is very effective to visualize a sequence of events in chronological order.
In this project, our team will use Shiny and utilize the most
appropriate visualisation to curate a study on how the city’s economics
situation changes over the study period.
### 3. Project Objectives
The objective of this project is to use suitable visualisations to address how the economic condition of the city changes over the study period. Based on available data, we will zoom into how businesses are performing, the job market, and standard of living of the residents. We plan to plot the following graphs in our R Shiny app:
The data set is obtained from https://vast-challenge.github.io/2022/. It contains 1000
Resident records over a 15-month period starting 1st March 2022.
Datasets that are relative to the challenge our study will explore are
as follows:
Diagram 1: Process Diagram
The Process Diagram demonstrates the overall process of group project: first, with proper R packages to prepare data and obtain data with structures that we need for the following data exploration; then, we use simple static visualisation methods to get an overview of the patterns for each challenge; last, we use more complicated interactive visualisation methods to explore and demonstrate more detailed information of each challenge.
Health of a business will be looked at on 3 main metrics:
To establish the health of the entire industry, one metric to understand is how much of the addressable market is being engaged. In order to do this, we will be looking at how many visitors are coming to pubs and restaurants from our sample. Given the entire addressable market is the whole population, we can see if there is room to grow for the F&B industry here. We can assess the following insights on pubs and restaurant:
One level below the entire industry, we look at how each individual establishments are attracting customers to:
Propose manpower plans based on lulls and peaks in business.
Identify potentially floundering businesses by geography or if there
is an independence from geography. (to use less overview of data)
(# of employee change, churn, etc)
Beyond the foot traffic, we also further look to ensure that foot
traffic does indeed provide actual revenue to the businesses. We will be
looking at a weekly heatmap to identify which periods are the most
profitable for the respective businesses to plan their revenue streams.
We could potentially identify higher value customers.
We will do a breakdown of the typical resident’s expenditure (rent,
food, entertainment etc.) to understand what the typical resident spends
their money on. 1) explore salary and expenditure pattern
2) Joviality vs salary, education, or recreational activities 3) (add
interactivity, make the chart more revealing, e.g. tooltip)
Going a level down, we will then try to understand the savings rate
of the sample and see if there is any relationship of wages and
expenditure to education level.
We will then endeavour to understand how education spending is
related to having kids and education level of individuals
We will be trying to assess the turnover rates of employers by correlating the sample’s job changes with the available employers by location. This might be able to generate some geographic insights, however this is limited due to the high number of companies relative to the sample’s participants
An indicator of employers longer economic health is the quality of
jobs provided. We will be using the wages of jobs offered as an
indicator of this.
The project timeline is set as follows:
Shiny: R
package that makes it easy to build interactive web apps straight from
R.
tidyverse:
opinionated collection of R packages designed for data science.
igraph: a library
collection for creating and manipulating graphs and analyzing
networks.
tidygraph:an
approach to manipulate these two virtual data frames using the API
defined in the ‘dplyr’ package, as well as provides tidy interfaces to a
lot of common graph algorithms.
ggraph:
ggraph is an extension of the ggplot2 API tailored to graph
visualizations and provides the same flexible approach to building up
plots layer by layer.
visNetwork:
a R package for network visualization.
lubridate:
Lubridate makes it easier to do the things R does with date-times and
possible to do the things R does not.
clock: an R package
for working with date-times.
[1] Siny from R Studio, https://shiny.rstudio.com/
[2] Debolina Biswas, 5 Oct 2021, https://analyticsindiamag.com/tableau-vs-shiny-which-one-should-you-pick-for-data-visualisation/
[3] https://www.tableau.com/learn/articles/data-visualization
[4] 17 Data Visualisation Techniques All Professionals Should Know,
Kelsey Miller,
https://online.hbs.edu/blog/post/data-visualization-techniques