The purpose of this project is to understand the skills and competencies of jobs potential ESM graduates may be qualified for.
Project files can be found here: https://github.com/acircleda/Data-Jobs
All data comes from administrative job ads HigherEdJobs using the following searches:
The initial plan was to use the rvest webscraper package to automatically crawl these searches and scrape these jobs. However, while testing, I realized only a limited amount of data could be downloaded before the webscraper was blocked as a bot.
Due to webscraping limitations, jobs were manually downloaded from the searches. This had the added advantage of “human filtering” - removing jobs that were unrelated or clearly required discipline-specific degrees (e.g. business, biostats).
All downloaded jobs can be found in the job-files directory on Github.
The content of the files is scraped using the rvest package. The code can be found in the scraping.R file. The code is then analyzed using tidytext and words from the word lists.R file.
Number of Jobs Analyzed |
---|
92 |
Program/Language | n |
---|---|
sql | 39 |
tableau | 35 |
database | 34 |
excel | 34 |
r | 29 |
spss | 28 |
sas | 26 |
python | 21 |
oracle | 11 |
warehouse | 11 |
qualtrics | 10 |
stata | 10 |
cognos | 6 |
powerbi | 5 |
java | 3 |
nvivo | 3 |
minitab | 2 |
salesforce | 2 |
sap | 2 |
scala | 2 |
vba | 2 |
apache | 1 |
arcgis | 1 |
hadoop | 1 |
html | 1 |
javascript | 1 |
matlab | 1 |
qlik | 1 |
Statistics | n |
---|---|
mining | 13 |
descriptive | 9 |
inferential | 5 |
logistic | 4 |
regression | 4 |
factor | 3 |
projections | 3 |
forecasts | 2 |
hierarchical | 2 |
anova | 1 |
correlation | 1 |
correlations | 1 |
pca | 1 |
structural | 1 |
Other Skills | n |
---|---|
reporting | 68 |
statistics | 52 |
presentation | 44 |
visualization | 34 |
report | 33 |
qualitative | 30 |
survey | 29 |
presentations | 25 |
dashboards | 20 |
cleaning | 12 |
graphs | 12 |
warehouse | 11 |
dashboard | 5 |
questionnaire | 2 |
wrangling | 1 |
To develop this project more, I plan to: