We accurately predicted the current rate of unemployment in Italy in 2008 !

A year ago, we revisited our prediction of the rate of unemployment in Italy made in 2008 and found that it was excellent. The model worked well. As it has an 11-year horizon, we can check our old prediction for 2012 and 2013 (preliminary). A new estimate of unemployment rate in 2012 is 10.6%. For 2013, the rate is 11.5 in March.  There is no doubt, these values fully validate our model of unemployment as a function of the change in labour force.    

We introduced the model of unemployment in Italy in 2008 with data available only for 2006. The rate of unemployment was near its bottom at the level of 6%. The model predicted a long-term growth in the rate unemployment to the level of 11% in 2013-2014. 

The agreement between the measured and predicted unemployment estimates in Italy validates our concept which states that there exists a long-term equilibrium link between unemployment, ut, and the rate of change of labour force, lt=dLF/LFdt. Italy is a unique economy to validate this link because the time lag of unemployment behind lt  is eleven (!) years.   

The estimation method is standard – we seek for the best overall fit between observed and predicted curves by the LSQR method. All in all, the best-fit equation is as follows: 

ut = 5.0lt-11  + 0.07    (1) 

As mentioned above, the lead of lt is eleven years. This defines the rate of unemployment many years ahead of the current change in labour force.  

Figure 1 presents the observed unemployment curve and that predicted using the rate of labour force change 11 years ago and equation (1). Since the estimates of labour force in Italy are very noisy we have smoothed the annual predicted curve with MA(5). All in all, the predictive power of the model is excellent and timely fits major peaks and troughs after 1988. The period between 2006 and 2013 was predicted almost exactly. (If anybody knows a better prediction in 2008 of 2013 unemployment rate please give us the link.)  This is the best validation of the model – it has successfully described a major turn in the evolution of unemployment near its bottom. No other macroeconomic model is capable to describe such dramatic turns many years ahead. Four years ago, we expected the peak in the rate of unemployment in 2013-2014 at the level of 11% (+5% from the level in 2008) and it has come! 

The evolution of the rate of unemployment in Italy is completely defined ten year ahead.  Since the linear coefficient in (1) is positive one needs to reduce the growth in labour force (see Figure 3) in order to reduce unemployment in the 2020s. For the 2010s everything is predefined already and the rate of unemployment will be high, i.e.  above 9%.

Figure 1. Observed and predicted rate of unemployment in Italy.
Figure 2. The rate of growth in labour force.


Waveform cross correlation at the International Data Centre: comparison with Reviewed Event Bulletin and regional catalogues

Another poster at the EGU 2013.

Waveform cross correlation substantially  improves detection, phase association, and event building procedures at the International Data Centre (IDC) of the Comprehensive Nuclear-Test-Ban Treaty Organization. There were 50% to 100% events extra to the official Reviewed Event Bulletin  (REB) were found in the aftershock sequences of small, middle size, and very big earthquakes. Several per cent of the events reported in the REB were not found with cross correlation even when all aftershocks were used as master events. These REB events are scrutinized in interactive analysis in order to reveal the reason of the cross correlation failure. As a corroborative method, we use detailed regional catalogues, which often include aftershocks with magnitudes between 2.0 and 3.0. Since the resolution of regional networks is by at least one unit of magnitude higher, the REB events missed from the relevant regional catalogues are considered as bogus. We compare events by origin time and location because the regional networks and the International Monitoring System are based on different sets of seismic stations and phase comparison is not possible.   Three intracontinental sequences have been studied: after the March 20, 2008 earthquake in China (mb(IDC)=5.4), the May 20, 2012 event in Italy (mb(IDC)=5.3), and one earthquake (mb(IDC)=5.6) in Virginia, USA (August 23, 2011).  Overall, most of the events not found by cross correlation are missing from the relevant regional catalogues. At the same time, these catalogues confirm most of additional REB events found only by cross correlation. This observation supports all previous findings of the improved quality of events built by cross correlation. 

Seismicity of the North Atlantic as measured by the International Data

I have uploaded a poster presented at the EGU 2013.

The Technical Secretariat (TS) of the Comprehensive Nuclear Test-Ban Treaty Organization (CTBTO) will carry out the verification of the CTBT which obligates each State Party not to carry out nuclear explosions. The International Data Centre (IDC) receives, collects, processes, analyses, reports on and archives data from the International Monitoring System. The IDC is responsible for automatic and interactive processing of the International Monitoring  System (IMS) data and for standard IDC products. The IDC is also required by the Treaty to progressively enhance its technical capabilities. In this study, we use waveform cross correlation as a technique to improve the detection capability and reliability of the seismic part of the IMS. In order to quantitatively estimate the gain obtained  by cross correlation on the current sensitivity of automatic and interactive processing we compared seismic bulletins built for the North Atlantic (NA), which is a seismically isolated region with earthquakes concentrating around the Mid-Atlantic Ridge. This allows avoiding the spill-over of mislocated events between adjacent seismic regions and biases in the final bulletins: the Reviewed Event Bulletin (REB) issued by the IDC and the cross correlation Standard Event List (XSEL). To begin with, we cross correlated waveforms recorded at 18 IMS array stations from 1500 events reported in the REB between 2009 and 2011. The resulting cross correlation matrix revealed the best candidates for master events. We have selected 60 master events evenly distributed over the seismically active zone in the NA. High-quality signals (SNR>5.0) recorded by 10 most sensitive array stations were  used as waveform templates. These templates are used for a continuous calculation of cross correlation coefficients  in the first half of 2012. All detections obtained by cross-correlation are then used to build events according to the current IDC definition: at least three primary stations with accurate arrival times, azimuth and slowness estimates. The qualified event hypotheses populated the XSEL. In order to confirm the XSEL events not found in the REB, a portion of the newly built events was reviewed interactively by experienced analysts. The influence of all defining parameters (cross correlation coefficient threshold and SNR, fk-analysis, azimuth and slowness estimates, relative magnitude, etc.) on the final XSEL has been studied using the relevant frequency distributions for all detections vs. only for those which were associated with the XSEL events. These distributions are also station and master dependent. This allows estimating the thresholds for all defining parameters, which may be adjusted to balance the rate of missed events and false alarms. 


Spain: 33% unemployed in 2013? Revisited

Today, Bloomberg discusses the rate of unemployment in Spain, which rose to 27.2%.  This level is interpreted as an unexpectedly high. In October 2012, we described the evolution of unemployment in Spain (it was 25% on the date of posting) and predicted the rate of unemployment at 33% if real GDP falls by 10% in 2013. We are using the LSQ technique applied to the integral version of Okun’s law 

u(t) = u(t0) + bln[G/G0] + a(t-t0)  (1)  

where u(t) is the rate of unemployment at time t, G is the level of real GDP per capita (we used TED, The Conference Board, EKS PPP ), a and b are empirical coefficients.   The best-fit (dynamic) model for Spain minimizing the RMS error of the cumulative model (1) is as follows:  

du = -0.406dlnG + 2.00, t<1995 
du = -1.11dlnG + 1.54, t>1994    (2)  

This model suggests a big shift in the slope and a smaller change in the intercept around 1995. Having a new unemployment estimate for 2012, we have updated the prediction of (2) in Figure 1. The 2012 real GDP reading fully confirms the excellent predictive power of the model. The predicted value is 25.5% and the reported rate of unemployment in the fourth quarter of 2013 is 26%. 

Figure 1 also repeats our prediction (red circle) of the unemployment rate in Spain in 2013 for a 10% real GDP drop. The Bank of Spain report a 2% fall in Q1, which is much smaller than 10%. In any case the current economic performance in Spain is poor. From (2) it follows that even a zero GDP growth rate results in a 1.5% increase in unemployment.

Figure 1.  The observed and predicted rate of unemployment in the Spain between 1971 and 2012.  In 2013, the rate may reach 33% in case of 10% fall in real GDP.  

The cumulative form of the dynamic Okun’s law is characterized by standard error of 1.68% for the period between 1971 and 2011 (0.92% after 1995). The average rate of unemployment for the same period is 13.6% (14.6% after 1995) with a standard deviation of the annual increment of 2.12%.


CPI deflation in Japan will strengthen

I've borrowed this table from Japan Statistics. Considering the fact that only the price of fuel and transportation has been rising, one can interpret the current fall in oil price as a cause of the overall price drop in the months to come. Price deflation  is strengthening.

Table 1 Japan, February 2013

IndexMonthly Change(%)Annual Change(%)
(10 Major Group Index)
All items99.2 -0.2 -0.7
(All items, less imputed rent)99.2 -0.2 -0.7
Food99.3 -1.0 -1.8
Housing99.2 0.0 -0.4
Fuel, light and water charges108.7 0.1 3.0
Furniture and household utensils89.9 0.7 -3.8
Clothes and footwear95.5 -0.9 -0.5
Medical care98.0 0.0 -0.5
Transportation and communication101.9 0.6 0.9
Education98.5 0.1 0.4
Culture and recreation92.1 0.0 -2.8
Miscellaneous103.4 0.2 -0.1
All items, less fresh food99.2 0.1 -0.3
All items, less food (less alcoholic beverages) and energy97.6 0.0 -0.9
(Goods and Service Group Index)
Goods98.7 -0.3 -1.3
Services99.6 0.0 -0.1


The rate of unemployment on its way to 6% in December 2013

Four months ago, we discussed the rate of unemployment in the US and published our forecast for 2013. We predicted an extended unemployment fall period down to the level of 6.2% in the third quarter of 2013. This prediction was made after we accurately forecasted (on March 1, 2012) the rate of unemployment in the US to fall down to 7.8% by the end of 2012. Here we update our model and present the evolution of the unemployment rate in the first quarter of 2013. The measured rate has been following our prediction up. We foresee the rate to fall down to 6% in the fourth quarter of 2013.

In 2006, we developed three individual empirical relationships between the rate of unemployment, u(t), price inflation, p(t), and the change rate of labour force, LF(t), in the United States. We also revealed a general relationship balancing all three variables. Since measurement (including definition) errors in all three variables are independent it may so happen that they cancel each other (destructive interference) and the general relationship might have better statistical properties than the individual ones. For the USA, the best fit model for annual estimates was a follows:

u(t) = p(t-2.5) + 2.5dLF(t-5)/dtLF(t-5) + 0.0585 (1)

where inflation (CPI) leads unemployment by 2.5 years (30 months) and the change in labor force leads by 5 years (60 months). We have already posted on the performance of this model several times.

For the model in this post, we use monthly estimates of the headline CPI, u, and labor force, all reported by the US Bureau of Labor Statistics. The time lags are the same as in (1) but coefficients are different since we use month to month-a-year-ago rates of growth. We have also allowed for changing inflation coefficient. The best fit models for the period after 1978 are as follows:

u(t) = 0.63p(t-2.5) + 2.0dLF(t-5)/dtLF(t-5) + 0.07; between 1978 and 2003

u(t) = 0.90p(t-2.5) + 4.0dLF(t-5)/dtLF(t-5) + 0.30; after 2003

There is a structural break in 2003 which is needed to fit the predictions and observations in Figure 1. Due to strong fluctuations in monthly estimates of labor force and CPI we smoothed the predicted curve with MA(24).

The structural break in 2003 may be associated with the change of sensitivity of the rate of unemployment to the change of inflation and labor force. Alternatively, definitions of all three (or two) variables were revised around 2003, which is the year when new population controls were introduced by the BLS. The Census Bureau also reports major revisions to the Current Population Survey, where the estimates of labor force and unemployment are taken from. Therefore, the reason behind the change in coefficients night be of artificial character - the change in measuring units.

Figure 1 depicts the prediction and the observed fall in the rate of unemployment. Figure 2 shows that the observed and predicted time series are well correlated (R2=0.82). This is a good statistical support to the model.

Figure 3 depicts the predicted rate of unemployment for the next 12 months. The model shows that the rate will fall to 6.0 % by December 2013. For 110 observations since 2003, the modelling error is 0.4% with the precision of unemployment rate measurement of 0.2% (Census Bureau estimates in Technical Paper 66). Hence, one may expect 6.0% [±0.4%]. Meanwhile, we expect a dramatic drop in the rate of unemployment in April/June 2013. It should come as “unexpected” by the mainstream economic forecasters. 
Figure 1. Observed and predicted rate of unemployment in the USA as obtained in April 2013.

Figure 2.  Observed vs. predicted rate of unemployment between 1967 and March 2013. The coefficient of determination   Rsq=0.82.

Figures 3. The predicted rate of unemployment. We expect the rate to fall down to 6.0% in December 2013.


An unusual feature of income distribution in the USA - a Pareto law at very low incomes

There is an invaluable source of quantitative information on personal income distribution (PID) in the USA which is hidden in a set of clumsy photocopies one can find searching the most remote parts of the Census Bureau web site. Today, we are discussing the distribution of personal money incomes in the age group 14 to 15 years. This is an extremely narrow group with data available only between 1967 and 1974. The essence of this short time series consists in the power (Pareto) law distribution of personal incomes starting from zero level. It is a well-known fact that the highest incomes are controlled by the Pareto law as well as many distributions in physics associated with self-organized criticality.  Therefore, we consider the 14 and 15 years PID as an extremely important quantitative feature likely related to the frozen structure of income distribution not to the long-term process. These people just entered the economy and were distributed over the pre-existing structure straight away.

Some words about the data. I am a long-term user of the CB data but it took me couple minutes to get through a broken link to the directory http://www.census.gov/prod/www/population.html, where the original copies sit. They are not digitized. Fortunately, I did my homework in 2003 and spent a month to convert them into a digital format.  These data allowed developing a mechanical model for the evolution of personal incomes since 1947, which is presented in my book “Mechanical model of personal income distribution”. There is much more left in the dataset, however.

Figure 1 below displays four personal income distributions for 1968. We present three age groups: 14 to 15, 16 to 19, and 20 to 24 years as well as the whole population distribution marked “all”.  In the group 14 to 15 years, there exists a power law distribution with an exponent of -2.01 for the whole income range.  In the group between 20 and 24 and in “all” only higher incomes (>$5000) are distributed according to the Pareto law. The PID for the group 16 to 19 is rather exponential. Hence, the PID evolves over time. It starts from a power law for very young people and then transforms into an exponential distribution for work experience of a few years. After five to seven years of work, two branches are observed: a quasi-exponential PID at lower incomes and a power-law one for higher incomes. It is important to stress that the PID for the Pareto law for higher incomes is characterized by a different exponent of -3.2 (see Figure 2). This is a quite different distribution.  

By all means, the PID evolution expresses both the underlying structure of income distribution and the long-term process.  It is highly unfortunate that the Census Bureau reported so short time series for younger people and did not provide reliable estimates of the Pareto distribution for higher incomes – only three to four estimates covering the range of high incomes. 
Figure 1. Four PIDs in 1968. Notice the power law distributions for the whole income range in the group 14 to 15 as well as for higher incomes (>$5000) in the group between 20 and 24 and in “all”. The PID for the group 16 to 19 israther exponential.
Figure 2. The estimation of two exponents for power law PIDs.


Money income inequality in the USA has not been changing since 1947 in all age groups

The Census Bureau reports money income distribution measured in the annual Current Population Surveys. Figure 1 shows the portion of population with income in various age groups. From 1947 to 1977, the portion with income grows linearly. In 1947, only 64.4% of the working age population in the USA had incomes according to the CB definition. In 1990, the portion peaked at 93.3%. The group between 15 and 24 years of age has a limited time span and demonstrates a dramatic fall since 1994.  In other age groups, the portion also falls since 1994, but at a slower rate.  In 1994, a new income definition and measuring procedure was introduced. The CB reports on the CPS results only from 1994. Before, only scanned copies of paper reports were available.

The estimates of money income inequality depend on income definition and thus on the portion of population with income.  Figure 2 depicts the evolution of Gini ratio in the same age groups, which reveals a similar growth between 1947 and 1977. When the change in the portion of population with income is compensated in the past, the Gini ratios in Figure 2 are likely to be constant since 1947.

The level of (money) income inequality in various age groups is practically constant over time.

It is worth noting that the jump in the portions of population with income between 1977 and 1979 did affect the Gini estimates and a step of 0.02 is observed in all groups above 25 years of age. The youngest group and the overall estimate did not show any change.

Figure 1. The portion of population with income in various age groups as reported by the Census Bureau.
Figure 2. Age dependent Gini ratio since 1947 as measured from personal income distributions published by the Census Bureau. “With income” – the Gini ratio for the whole working age population with income.


Money income inequality in the USA is rock solid

Ten years ago we started modelling income inequality in the US using data gathered by the Census Bureau. Personal money income has been measured in the annual Current Population Surveys (CPS) since 1947. The CB counts populations in various income bins and publishes the distribution of personal incomes. Unfortunately for researches, only PDF scans are available before 1994. When these scans were still available, we downloaded them and converted into digital format.  These distributions allow direct calculations of Gini ratio. Figure 1 demonstrates that the level of inequality, as expressed by money income, is rock solid since the late 1950s. It oscillates between 0.51 and 0.52. This is the longest and most consistent time series of income inequality.

Between 1947 and 1959, Gini ratio grew. 

Figure 1. Gini ratio since 1947


The use of synthetic master events for waveform cross correlation

The EGU General Assembly 2013  starts on Monday, April 8. We have three posters on  various aspects of waveform cross correlation in seismic monitoring. This is the link to one of them in PDF.  www.academy.edu is again very useful.

And abstract:

The use of synthetic master events for waveform cross correlation


It has been clearly demonstrated that waveform cross correlation substantially improves signal detection, phase association and event building. These processes are inherently related to the Comprehensive Nuclear-Test-Ban Treaty (CTBT) monitoring. The workhorse of cross correlation is the set of seismic master events (earthquakes or explosions) with high quality waveform templates recorded at array stations of the International Monitoring System (IMS). For the monitoring to be globally uniform, these master events have to be evenly distributed and their template waveforms should be representative and pure. However, global seismicity is characterized by a non-uniform distribution. Therefore, the master events selected from the Reviewed Event Bulletin (REB) produced by the International Data Centre (IDC) can be found in the areas constrained by the global seismicity. There are two principal possibilities to populate the globe with master events: to replicate real REB events or to build synthetic events. Here we compare the performance of these two approaches as applied to the aftershock sequence of the April 11, 2012 Sumatera earthquake. To compute synthetic waveforms, we use AK135 teleseismic velocity model and local CRUST-2 models for source and receiver, and four different source functions representing three different source mechanisms for earthquakes and one for explosion. The synthetic modelling is performed for teleseismic events and based on the stationary phase approximation to a wave equation solution developed by J. Hudson. The grid covering the aftershock area consists of 16 points. For each grid point, we find detections associated with real, replicated, and four versions of synthetic master events at seven IMS array stations, and then build event hypothesis using the Local Association (LA) procedure based on the clustering of origin times as estimated by back projection of the relevant arrival times with known master/station travel times. Then all conflicts between the hypotheses built by different masters for physically same events are resolved. There are two principal ways to compare the performance of actual, replicated, and synthetic master events: to compare the characteristics/distributions of detections (also station dependent) and those of event hypotheses. Both datasets have shown that the synthetic events provide the same overall performance as the real and replicated master events. The best performance is associated with the explosion source and the earthquake with the Harvard CMT solution for one of real events. When source mechanism and velocity model are appropriately chosen, the global grid of synthetic masters may allow a reduction in the magnitude threshold of seismic monitoring and improving the accuracy and uncertainty of event locations at the IDC to the level of the best located events. When a ground truth event is available, one can expand its influence over hundreds of kilometres.