Occasional Papers



Topic 8

Do you need fast training for your network models?

This paper is about training differentiable models to take inputs, and to approximate, or "regress to", expected or desired outputs.

The paper includes the training of differentiable models with feedback, so-called "recurrent" models. Models with feedback can be more difficult to train, but they have important applications in our physical world.

Sometimes we train a model just to have a general equation that relate inputs to outputs.

Sometimes we train a model to diagnose how the outputs depend on the inputs. We put data in, we usually get expected data out, and then we say "Now I am going to diagnose how these outputs depend on those inputs under the current model, and maybe I can make the model better!"

Sometimes we train a model to predict specific outputs. We put in today's data, we get expected data out, and we see "Ah hah, that is what would - or will - happen in today's case!"

These goals may sound nice, but are they realistic? How easy is it to load enough of the information that exists in a lot of data, onto just a few numbers, the few so-called "parameters" of a model? How easy is it to get good-enough results out?

The answer is that sometimes it is easy, sometimes it is hard, and sometimes it is impossible. There is a whole science about the difficulty of loading different types of information onto different types of models.

Here is an easy case. Take (x,y) pairs that lie more-or-less on a straight line, and train the best-fitting linear model, y = ax + b. We only have to find two parameter values, "a" and "b", to get a good approximation of the desired or expected outputs.

Here is an impossible case. Take time-ordered (x,y) pairs measured from the trajectory of an iron ball fired into the air from a cannon, and train the best-fitting linear model again. The model is too simple: one best-fitting linear model cannot do a good job of approximating these y's as a function of x.

Between the easy cases and the impossible cases are the rest of the possible cases, where a model has at least enough of the right degrees of freedom, and where there is enough data to learn to approximate or "regress" the inputs to the expected or desired outputs.

However, these harder cases which are possible in theory may still be impossible in practice, in particular because our approach to training might not be good enough.

Suppose the way we initialize our model keeps it from learning to be a good predictor of the expected or desired outputs. Now what could we do?

Suppose our training algorithm takes too long. What if we need to train and apply a new model on each day's data, but training takes longer than a day? Now what?

Suppose our model has too many degrees of freedom. How could we reduce the tendency of such a model to overfit the data? How could we make our model adapt itself, to have effectively a smaller and better form for solving our problem?

Suppose our application is to train a model on continuously varying inputs, to produce discretely changing outputs. For example, suppose our application is to produce correct results in a classification task. The right answer in one four-way classification might be [1,0,0,0], not [0,1,0,0], [0,0,1,0] or [0,0,0,1]. How could we train our model to do a better job of discriminating right answers from wrong answers, instead of just approximating right answers?

Suppose it is hard to avoid putting into our model at least some data that should be treated as irrelevant. What if our model learns to do a better job of matching the desired outputs for that irrelevant data, by doing worse on the part of the data that we really care about?

Suppose our application is to produce discretely changing classification-like outputs, but it would be hard for the model to learn to map its continuously varying inputs to a discrete, desired or "target" output value like [1,0,0,0] that we externally supply? Are there alternatives to using an externally supplied, discrete target function?

This paper addresses these practical problems for multi-layered network models that are either feedforward or recurrent in structure, on an application in speech recognition.

All models in this paper are based on a superposition of logistic "sigmoid" functions, each of the form y = 1/(1+e^(-x)). All are trained to discriminate the letter names "b", "d", "e" and "v", as spoken by different talkers.

Warning: these suggestions may not be easy to implement. Until we debug our implementations, they will not work as we want. After some amount of work we may ask ourselves "Why are we bothering?"

We should remember this enjoyable benefit of taking something that seems impossible in practice and turning it into something which is readily doable: if we persist, the model may not be the only thing that learns; we ourselves may learn as well. So persist, learn and enjoy!

Could suggestions from this paper help produce good results in your applications?

To read this paper, co-authored with my friend Dr. Norman Herzberg, click on this link: 100117_Variations_on_Training.pdf

A network with 652 parameters is trained in 100 iterations through the data.






Topic 7

Interested in comparing feed-forward and recurrent

sensitivities in speech recognition?

This paper is about diagnosing systems whose inputs and outputs vary over time.

In a "feed-forward" system, the effects of a given input pass through the system in a finite amount of time, after which that input no longer affects the outputs. For example, a healthier live chicken is taken to a packing plant, and some time later a better piece of chicken meat comes out in a shrink-wrapped package.

In a "feed-back" or "recurrent" system, previous outputs are included among following inputs, so an input at one point in time can potentially affect outputs forever after. For example, a healthier chicken is put into a breeding process, and more and healthier chickens may then be available in the next cycles of the breeding process.

In this paper, both types of systems are used, and they are diagnosed. When we say "diagnose", we mean diagnosing the dependence of outputs on earlier inputs.

One way to diagnose the dependence of outputs on earlier inputs is called "calculating the sensitivity".

Here is a technical question that can be answered by calculating the sensitivity: if we make a small increase in one of the system inputs at a given point in time, how much increase or decrease will we see in each of the system outputs over all following points in time?

One way to calculate sensitivities is to train what mathematicians call a "differentiable model".

In a differentiable model, outputs, called "y", are related to inputs, called "x", by a differentiable function, called "f". We say "y equals f of x", and we write "y = f(x)". This notation may be familiar to you from high-school algebra.

To train a differentiable model, we structure the model and train its weights "w", to approximate desired outputs. Maybe we chose a structure like "y = f(w*x)", maybe we chose "y = w*f(x)", or maybe we chose both, with "y = w2*f(w1*x)". Maybe the model is a cascade of such structures.

I have posted a paper above (Topic 8) about training such models. Let me point out again here that having structured and trained a model, we can diagnose the input-output dependencies of the trained model.

Note, however, that when a system undergoes change, we may have to re-train the model.

Why bother with any of this? Because once we have modeled how a system's outputs depend on its inputs, we can predict the outputs, or we can try to modify the inputs or the structure of the system to improve the outputs.

This process of modeling how a system works, and how a system might be modified to work better, can contribute to scientific discovery.

Applications abound, either in physical science or in social science. Maybe we are studying chickens, maybe we are studying market responses, or maybe we are studying the education of children who will grow up to be voting adults.

You can learn about differentiable models or about approximating desired outputs in courses on differential calculus or applied regression analysis.

With this kind of education, you can find a job with a technical branch of a company or of a government.

Should we be interested in making these kinds of scientific discoveries? Should we be interested in these kinds of jobs?

If you think so, you can see why taking math at least through differential calculus and applied regression analysis might be a good investment for our future.

This paper is an application of sensitivity analysis to speech recognition, co-authored with my friend Dr. Raymand Watrous.

To read this paper, click on this link: 090716_Comparison_of_Sensitivities.pdf

Features and feed-forward sensitivites for poorly-discriminated training examples






Topic 6

Has the US Republican Party lost its connection

to what we like and admire about business?

The states that voted Republican in the 2008 US presidential election tend to be lower in per capita GDP, lower in K-12 education, and lower in public health. This statement will come as a surprise to some and as an outrage to others. What, if any, are its implications for US voters and for the Republican Party?

Click this link for this essay in .htm format: 090524_Republican_Party.htm
Click this link for the same essay in .pdf format: 090524_Republican_Party.pdf

Comments:

AMEN!!! TD, May 25, 2009

Interesting, very GKuhn-like data. But the question you allow yourself to ask is not exciting. There is a conclusion that is begging to be made from this, and I wish I knew what it is. SJ, May 26, 2009

Indeed, the graph is very interesting. As you say, causality is another matter, but one is tempted to make a connection: low GDP -> low education -> conservative religious -> Republican. I may be perfectly wrong, or a victim of some prejudice. PI, May 26, 2009





Topic 5

Interested in the JUPITER Crestor study

and an opportunity for health-care reform in the US?

The recent "JUPITER" Crestor study showed that the statin drug Crestor cut the risk of heart attack and stroke for people with normal cholestorol but elevated C-reactive protein (an indicator of inflammation in the arteries). After a lengthy discussion of the results of this study we review
-- some life-style changes that I made to reduce inflammation and the risk of blood clots,
-- some potent compounds that we can choose to ingest for the same purposes, from so-called "food", and
-- an opportunity for health-care education in the US, which opens up a whole list of population-based problems that we need to address in the US.
This paper was the basis for a discussion held on March 9, 2009 with Dr. Majid Ali, on WBAI radio, 99.5 FM, in New York.

Click this link for this essay in .htm format: 090309_for_Majid_Ali.htm
Click this link for the same essay in .pdf format: 090309_for_Majid_Ali.pdf

Statistics for 50 US states and Washington DC. Top: K-12 education rankings, source: US Chamber of Commerce, 2007. Bottom: Health-care scores, source: United Health Foundation, 2008, graphed by Time Magazine.






Topic 4

Want a four-move solution to Rubik's cube?

This document describes a four-move solution for three versions of the cube puzzle originally sold by Ideal Toys under the name "Rubik's Cube".

One version of the puzzle is the original 3x3x3 cube which has colored squares on its sides. The colors are white, green, orange, blue, red or yellow. The second version is the 3x3x3 cube for the visually handicapped sold by LS&S Group. This version has raised symbols on its sides. The symbols are a triangle, a hollow circle, a hollow square, a filled circle, a filled square, or a letter "x". The third version is the 2x2x2 cube puzzle sold by Winning Moves, which looks like the head of Matt Groening's cartoon character Homer Simpson. Click the link above for this report from November, 2007.






Topic 3

How does the 2007 US Chamber of Commerce education report card

relate to a Pandora's box of population-based problems for the US

and a democratic opportunity for US business?

According to a 2007 US Chamber of Commerce (CofC) study, the measures of the shortcomings of the US K-12th grade education system "are stark indeed". We analyze a state-based summary of this study and discover that the results are even worse when viewed in population-based terms.

By comparing results in state-based, elector-based and population-based terms we quantify insensitivities of the US presidency, the US Senate and the US constitutional amendment process to population-based problems, and an inability of the US government to adapt to population-based problems.

We name a large number of population-based problems that the US must address, which include in addition to K-to-12th grade education: health-care, illegal drugs, violent crime, homeland security, retraining of unemployed adults, support for retirement, and the exhausting of natural resources like oil.

We argue that it is urgent for the US to address these problems, which means that the US must respond promptly to the double challenge of (1) poor K-12th grade education, and (2) the broader political Pandora's box of an unrepresentative government with additional, urgent, population-based problems.

This double challenge presents a democratic opportunity for business, which has historically benefited from the democratization of government.

A first step would be to support full, population-based selection of the US president, by a system of one person, one vote. The next question is how to accomplish this step promptly when the normal way of not changing the US Constitution, namely the constitutional amendment process, is still in place.

Our answer is to help pass a state-by-state compact called "National Popular Vote". ( www.nationalpopularvote.com )

We ask the US Chamber of Commerce and US business in general to please announce their support for National Popular Vote, and we foresee that further steps, like population-based voting in the US Senate, and population-based voting for amendments to the US Constitution, could be built on business's success in this first step.

Click this link for this essay from August, 2007, in .htm format: 070831_US_CofC_education_report_card.htm

Click this link for the same essay in .pdf format: 070831_US_CofC_education_report_card.pdf

The 2007 US Chamber of Commerce K-12 Education Report Card






Topic 2

When it comes to U.S. presidential elections,

is it true that 78 million of us do not exist?

We Americans have not acknowledged how undemocratic our U.S. presidential election system is.

It can be shown that the U.S. presidential election system has the same effect as ignoring 78 million Americans in the 25 most populous states, or 64 million Americans in the 21 most populous states. It ignores more people in the 21 most populous states than the total population of the other 29 states plus Washington D.C.

To eliminate this undemocratic effect, one solution would be to pass a constitutional amendment to drop the 2 "Senate" electors allocated to each state. Click the link above for this essay from August, 2004.


But, given the difficulty of passing a constitutional amendment, the state-by-state compact proposed by National Popular Vote ( www.nationalpopularvote.com ) now seems a more realistic alternative. See also Topic 3, above.





Topic 1

Interested in how speech works?

Even if you hold your hand in front of your mouth, a person who is listening to you can tell whether you are saying, for example, "ABBA" or "ADDA". To explain how this is possible, we need to model the relation between sound propagation in an acoustic tube (e.g. in our mouth) and formant-based acoustic cues for the phonetic dimension of place of articulation. Click the link above for this technical report from May, 1987.







Regards,

Gary Kuhn, Ph.D.
St Martin Systems, Inc.


Contact us