On Aggregating Salaries of Occupations From Job Post and Review Data

On Aggregating Salaries of Occupations From Job Post and Review Data

Abstract:

The popularity of job websites has significantly changed the way people learn about different occupations. Among the insights offered by these websites are the statistics of occupation salaries which are useful information for job seekers, career coaches, graduating students, and labor related government agencies. Such statistics include the distribution of job salaries of each occupation, such as average or quantiles. However, significant variability in salary (and review salary) can be found among jobs of the same occupation as we gather job post and review data from job websites. Such variability shows the existence of biases, including salary competitiveness in job posts and salary inflation in job reviews. Based on the observation, we aim at developing an approach to derive occupation salary for a job market, named unbiased salary, by aggregating offer salaries from job posts and review salaries from review data and at the same time removing their biases. To achieve this goal, we proposed COC-model to learn unbiased salaries of occupations, competitiveness of companies and inflation of companies efficiently. COC here is an abbreviation of “Company, Occupation, Company”, which represents two different connections between companies and occupations from job posting site and job review site. COC-model represents the dependency of salary information between companies and occupations in job post data and job review data. It begins with defining three latent variables, say competitiveness, inflation, and unbiased salary, based on their dependencies. Instead of computing these variables iteratively, we formulate the interaction among these three latent variables into a matrix form so that these values could be then efficiently learned in a unified way by a series of matrix operations. Extensive experiments are conducted, including empirical studies about competitiveness and inflation of companies using real dataset and performance testing by synthetic dataset. The experimental results show that COC-model can not only derive unbiased salaries effectively but also help us to understand latent biases in job post and job review data.