Occasional papers from Gary M. Kuhn, Ph.D.

Here is how to navigate rapidly through this St Martin Systems home page:
- Click on an underlined link to jump to its next occurrence in the current topic.
- Click on "↑" in the margin to jump up to the current topic in the Table of Contents.
- Click on "↓" in the margin to jump down to the next link in the current topic.
Click on an underlined title to download that paper from the website.

Table of Contents
Topic 8 Do you need fast training for your multi-layer networks?
1 ... This paper is about training differentiable models ...
2 ... Sometimes we train a model just to have a general equation ...
All Links: training, equation, diagnose, predict, practical problems
Topic 7 Are you interested in comparing feed-forward and recurrent sensitivities in speech recognition?
1 ... This paper is about diagnosing systems ...
2 ... In a "feed-forward" system, the effects of a given input ...
All Links: diagnosing, feed-forward, feed-back, recurrent, sensitivity, differentiable
Topic 6 Has the US Republican Party lost its connection to what we like and admire about business?
1 ... The states that voted Republican in the 2008 US presidential election ...
2 ... tend to be lower in per capita GDP , lower in K-12 education , and lower in public health ...
All Links: Republican, per capita GDP, K-12 education, public health
Topic 5 Interested in the JUPITER Crestor study and an opportunity for health-care reform in the US?
1 life-style changes that I made to reduce inflammation ...
2 ... potent compounds that we can choose to ingest for the same purposes, from so-called "food" ...
All Links: JUPITER, life-style changes, reduce inflammation, potent compounds, "food", health-care reform, Dr. Majid Ali
Topic 4 Want a four-move solution to Rubik's cube?
1 ... we can name its sides as follows.
2 Our overall plan is the following.
All Links: sides, "wheels", overall plan, six steps, four "moves"
Topic 3 How does the 2007 US Chamber of Commerce education report card relate to a Pandora's box of population-based problems for the US and a democratic opportunity for US business?
1 ... an unrepresentative government with additional, urgent, population-based problems
2 ... This double challenge presents a democratic opportunity for business ...
All Links: Chamber of Commerce, K-12th grade education, population-based problems, unrepresentative government, democratic opportunity, business, National Popular Vote
Topic 2 When it comes to U.S. presidential elections, is it true that 78 million of us do not exist?
1 ... has the same effect as ignoring 78 million Americans in the 25 most populous states ...
2 ... It ignores more people in the 21 most populous states than the total population of the other 29 states ...
All Links: undemocratic, presidential election system, ignoring, ignores more people, than the total population, National Popular Vote
Topic 1 Interested in how speech works?
1 ... Even if you hold your hand in front of your mouth ...
2 ... formant-based acoustic cues for the phonetic dimension of place of articulation ...
All Links: mouth, acoustic cues, physical model


TOPIC 8: Do you need fast training for your multi-layer networks? DATE POSTED: 20100117 Back to top Next topic TITLE: Variations on Training of Recurrent Networks This paper is about training differentiable models to take inputs, and to approximate, or "regress to", expected or desired outputs. Sometimes we train a model just to have a general equation that relates inputs to outputs. Sometimes we train a model to diagnose how the outputs depend on the inputs. We put data in, we usually get expected data out, and then we say "Now I am going to diagnose how these outputs depend on those inputs under the current model, and maybe I can make the model better!" Sometimes we train a model to predict specific outputs. We put in today's data, we get expected data out, and we see "Ah hah, this is what would - or will - happen in today's case!" These goals may sound nice, but are they realistic? How easy is it to load enough of the information that exists in a lot of data, onto just a few numbers, the few so-called "parameters" of a model? How easy is it to get good-enough results out? The answer is that sometimes it is easy, sometimes it is hard, and sometimes it is impossible. There is a whole science about the difficulty of loading different types of information onto different types of models. Here is an easy case. Take (x,y) pairs that lie more-or-less on a straight line, and train the best-fitting linear model, y = ax + b. We only have to find two parameter values, "a" and "b", to get a good approximation of the desired or expected outputs. Here is an impossible case. Take time-ordered (x,y) pairs measured from the trajectory of an iron ball fired into the air from a cannon, and train the best-fitting linear model again. The model is too simple: one best-fitting linear model cannot do a good job of approximating these y's as a function of x. Between the easy cases and the impossible cases are the rest of the possible cases, where a model has at least enough of the right degrees of freedom, and where there is enough data to learn to approximate or "regress" the inputs to the expected or desired outputs. However, these harder cases which are possible in theory may still be impossible in practice, in particular because our approach to training might not be good enough. Suppose the way we initialize our model keeps it from learning to be a good predictor of the expected or desired outputs. Now what could we do? Suppose our training algorithm takes too long. What if we need to train and apply a new model on each day's data, but training takes longer than a day? Now what? Suppose our model has too many degrees of freedom. How could we reduce the tendency of such a model to overfit the data? How could we make our model adapt itself, to have effectively a smaller and better form for solving our problem? Suppose our application is to train a model on continuously varying inputs, to produce discretely changing outputs. For example, suppose our application is to produce correct results in a classification task. The right answer in one four-way classification might be [1,0,0,0], not [0,1,0,0], [0,0,1,0] or [0,0,0,1]. How could we train our model to do a better job of discriminating right answers from wrong answers, instead of just approximating right answers? Suppose it is hard to avoid putting into our model at least some data that should be treated as irrelevant. What if our model learns to do a better job of matching the desired outputs for that irrelevant data, by doing worse on the part of the data that we really care about? Suppose our application is to produce discretely changing classification-like outputs, but it would be hard for the model to learn to map its continuously varying inputs to a discrete, desired or "target" output value like [1,0,0,0] that we externally supply? Are there alternatives to using an externally supplied, discrete target function? This paper addresses these practical problems for multi-layered network models that are either feedforward or recurrent in structure, on an application in speech recognition. All models in this paper are based on a superposition of logistic "sigmoid" functions, each of the form y = 1/(1+e^(-x)). All are trained to discriminate the letter names "b", "d", "e" and "v", as spoken by different talkers. Warning: these suggestions may not be easy to implement. Until we debug our implementations, they will not work as we want. After some amount of work we may ask ourselves "Why are we bothering?" We should remember this enjoyable benefit of taking something that seems impossible in practice and turning it into something which is readily doable: if we persist, the model may not be the only thing that learns; we ourselves may learn as well. So persist, learn and enjoy! Could suggestions from this paper help produce good results in your applications? To read this paper, co-authored with my friend Dr. Norman Herzberg, click on the above TITLE or on this link: 100117_Variations_on_Training.pdf Figure 1. A network with 652 parameters is trained in 100 iterations through the data.
TOPIC 7: Are you interested in comparing feed-forward and recurrent sensitivities in speech recognition? DATE POSTED: 20090716 Back to top Previous topic Next topic TITLE: Comparison of feed-forward and recurrent sensitivities in speech recognition This paper is about diagnosing systems whose inputs and outputs vary over time. In a "feed-forward" system, the effects of a given input pass through the system in a finite amount of time, after which that input no longer affects the outputs. For example, a healthier live chicken is taken to a packing plant, and some time later a better piece of chicken meat comes out in a shrink-wrapped package. In a "feed-back" or "recurrent" system, previous outputs are included among following inputs, so an input at one point in time can potentially affect outputs forever after. For example, a healthier chicken is put into a breeding process, and more and healthier chickens may then be available in the next cycles of the breeding process. In this paper, both types of systems are used, and they are diagnosed. When we say "diagnose", we mean diagnosing the dependence of outputs on earlier inputs. One way to diagnose the dependence of outputs on earlier inputs is called "calculating the "sensitivity". Here is a technical question that can be answered by calculating the sensitivity: if we make a small increase in one of the system inputs at a given point in time, how much increase or decrease will we see in each of the system outputs over all following points in time? One way to calculate sensitivities is to train what mathematicians call a "differentiable model". In a differentiable model, outputs, called "y", are related to inputs, called "x", by a differentiable function, called "f". We say "y equals f of x", and we write "y = f(x)". This notation may be familiar to you from high-school algebra. To train a differentiable model, we structure the model and train its weights "w", to approximate desired outputs. Maybe we chose a structure like "y = f(w*x)", maybe we chose "y = w*f(x)", or maybe we chose both, with "y = w2*f(w1*x)". Maybe the model is a cascade of such structures. I have posted a paper above ( Topic 8) about training such models. Let me point out again here that having structured and trained a model, we can diagnose the input-output dependencies of the trained model. Note, however, that when a system undergoes change, we may have to re-train the model. Why bother with any of this? Because once we have modeled how a system's outputs depend on its inputs, we can predict the outputs, or we can try to modify the inputs or the structure of the system to improve the outputs. This process of modeling how a system works, and how a system might be modified to work better, can contribute to scientific discovery. Applications abound, either in physical science or in social science. Maybe we are studying chickens, maybe we are studying market responses, or maybe we are studying the education of children who will grow up to be voting adults. You can learn about differentiable models or about approximating desired outputs in courses on differential calculus or applied regression analysis. With this kind of education, you can find a job with a technical branch of a company or of a government. Should we be interested in making these kinds of scientific discoveries? Should we be interested in these kinds of jobs? If you think so, you can see why taking math at least through differential calculus and applied regression analysis might be a good investment for our future. This paper is an application of sensitivity analysis to speech recognition, co-authored with my friend Dr. Raymand Watrous. To read this paper, click on the TITLE above or on this link: 090716_Comparison_of_Sensitivities.pdf Figure 17.2 Features and feed-forward sensitivites for poorly-discriminated training examples
TOPIC 6: Has the US Republican Party lost its connection to what we like and admire about business? DATE POSTED: 20090524 Back to top Previous topic Next topic TITLE: Has the US Republican Party lost its connection to what we like and admire about business? The states that voted Republican in the 2008 US presidential election tend to be lower in per capita GDP, lower in K-12 education, and lower in public health. This statement will come as a surprise to some and as an outrage to others. What, if any, are its implications for US voters and for the Republican Party? I defend this statement by reviewing three sets of data: per capita GDP data from the "BEA" or US Bureau of Economic Analysis, K-12 education scores from the "CofC" or US Chamber of Commerce, and public health scores from the "UHF" or United Health Foundation. To read this essay in .pdf format, click on the TITLE above or on this link: 090524_Republican_Party.pdf To read this essay in .htm format, click this link: 090524_Republican_Party.htm Figure 1. States ordered by 2008 Bureau of Economic Analysis per capita GDP. Comments: AMEN!!! TD, May 25, 2009 Interesting, very GKuhn-like data. But the question you allow yourself to ask is not exciting. There is a conclusion that is begging to be made from this, and I wish I knew what it is. SJ, May 26, 2009 Indeed, the graph is very interesting. As you say, causality is another matter, but one is tempted to make a connection: low GDP -> low education -> conservative religious -> Republican. I may be perfectly wrong, or a victim of some prejudice. PI, May 26, 2009
TOPIC 5: Interested in the JUPITER Crestor study and an opportunity for health-care reform in the US? DATE POSTED: 20090309 Back to top Previous topic Next topic TITLE: From the JUPITER Crestor study to an opportunity for health-care reform in the US The recent "JUPITER" Crestor study showed that the statin drug Crestor cut the risk of heart attack and stroke for people with normal cholestorol but elevated C-reactive protein (an indicator of inflammation in the arteries). After a lengthy discussion of the results of this study, I review -- some life-style changes that I made to reduce inflammation and the risk of blood clots, -- some potent compounds that we can choose to ingest for the same purposes, from so-called "food", and -- an opportunity for health-care reform in the US, which opens up a whole list of population-based problems that we need to address in the US. This paper was the basis for a discussion held on March 9, 2009 with Dr. Majid Ali, on WBAI radio, 99.5 FM, in New York. Click on the TITLE above or on this link for this essay in .pdf format: 090309_for_Majid_Ali.pdf Click on this link for the same essay in .htm format: 090309_for_Majid_Ali.htm Figure 1. Statistics for 50 US states and Washington DC. Left: K-12 education rankings, source: US Chamber of Commerce, 2007. Right: Health-care scores, source: United Health Foundation, 2008, graphed by Time Magazine.
TOPIC 4: Want a four-move solution to Rubik's cube? DATE POSTED: 20071130 Back to top Previous topic Next topic TITLE: A Four-Move Solution to Rubik's Cube This document describes a four-move solution for three versions of the cube puzzle originally sold by Ideal Toys under the name "Rubik's Cube". One version of the puzzle is the original 3x3x3 cube which has colored squares on its sides. The colors are white, green, orange, blue, red or yellow. The second version is the 3x3x3 cube for the visually handicapped sold by LS&S Group. This version has raised symbols on its sides. The symbols are a triangle, a hollow circle, a hollow square, a filled circle, a filled square, or a letter "x". The third version is the 2x2x2 cube puzzle sold by Winning Moves, which looks like the head of Matt Groening's cartoon character Homer Simpson. We use the 3x3x3 puzzle for the visually handicapped to describe the solution. Holding the cube straight up, we can name its sides as follows. There is U, the side which is up, D, the side which is down, L, the side to the left, R, the side to the right, F, the side in front, and B, the side in back. To repeat, the sides are U, up, D, down, L, left, R, right, F, front, and B, back. We can also name the internal "wheels" of the cube. Wheel X, is the second horizontal row of the cube, which we can think of as a wheel that goes from the front, to the right, to the back, to the left side. Wheel Y, is the second column of the cube as viewed from the front, which we can think of as a wheel that goes from the front, to the up, to the back, to the down side. Wheel Z, is the second column of the cube as viewed from on the right, which we can think of as a wheel that goes from the right, to the up, to the left, to the down side. Our overall plan is the following. First, do the corners of U. Second, do the edges of U. Third, do the corners of D. Fourth, get the edges of D in the correct position. Fifth, get the edges of X in the correct position. Sixth, get the edges of D and X in the correct orientation. These six steps are the subject of the six sections of this paper. While there are six steps in this solution, it is sufficient to know four "moves" or sequences of rotations. Move 1 interchanges corners. Move 2 re-orients corners in place. Move 3 interchanges edges. And move 4 re-orients edges in place. Even though we use four moves, this paper is organized in terms of the six steps, because the first side of the cube, e.g. the corners and edges of U, can be done readily without resort to memorized moves. In the "Discussion and Conclusion" section of this paper, we review the four moves, and suggest ways to make each move easier to memorize. Click on the TITLE above or on this link for this paper in .pdf format: 071130_cube.pdf Figure 1. Left: the original 3x3x3 cube. Middle: the 3x3x3 cube for the visually handicapped. Right: the 2x2x2 cube that looks like Homer Simpson
TOPIC 3: How does the 2007 US Chamber of Commerce education report card relate to a Pandora's box of population-based problems for the US and a democratic opportunity for US business? DATE POSTED: 20070831 Back to top Previous topic Next topic TITLE: The 2007 US Chamber of Commerce Education Report Card, a political Pandora's box, and a democratic opportunity for US business According to a 2007 US Chamber of Commerce (CofC) study, the measures of the shortcomings of the US K-12th grade education system "are stark indeed". We analyze a state-based summary of this study and discover that the results are even worse when viewed in population-based terms. By comparing results in state-based, elector-based and population-based terms we quantify insensitivities of the US presidency, the US Senate and the US constitutional amendment process to population-based problems, and an inability of the US government to adapt to population-based problems. We name a large number of population-based problems that the US must address, which include in addition to K-to-12th grade education: health-care, illegal drugs, violent crime, homeland security, retraining of unemployed adults, support for retirement, and the exhausting of natural resources like oil. We argue that it is urgent for the US to address these problems, which means that the US must respond promptly to the double challenge of (1) poor K-12th grade education, and (2) the broader political Pandora's box of an unrepresentative government with additional, urgent, population-based problems. This double challenge presents a democratic opportunity for business, which has historically benefited from the democratization of government. A first step would be to support full, population-based selection of the US president, by a system of one person, one vote. The next question is how to accomplish this step promptly when the normal way of not changing the US Constitution, namely the constitutional amendment process, is still in place. Our answer is to help pass a state-by-state compact called "National Popular Vote". ( www.nationalpopularvote.com ) We ask the US Chamber of Commerce and US business in general to please announce their support for National Popular Vote, and we foresee that further steps, like population-based voting in the US Senate, and population-based voting for amendments to the US Constitution, could be built on business's success in this first step. Click this link for this essay from August, 2007, in .htm format: 070831_US_CofC_education_report_card.htm Click this link for the same essay in .pdf format: 070831_US_CofC_education_report_card.pdf Figure 1. The score (height) and grade (color) for K-12 education in 50 US states and Washington DC. Source: US Chamber of Commerce, 2007. Figure 4. The scored (height) and graded (color) US states plus Washington DC, ordered by population (left-to-right) and with width proportional to representation (top) in the US Senate, (middle) by US presidential electors, or (bottom) in the US House of Representatives. Why do the people in the 21 most populous states (thru Minnesota, MN) get 93% of the presidential electors that they should have by population? Why do the people in the 29 least populous states plus Washington DC (starting with Louisiana, LA) get 129% of the presidential electors that they should have by population? Why does the 68% of the US population that lives in the 16 most populous states (thru Tennessee, TN) only get 32% of the vote in the US Senate? Why do the 103 million people in the five most populous states (CA, TX, NY, FL and IL) have an average K-12 education grade of D+? In the US federal system, does under-representation of the people in the most populous states produce an insensitivity to their population-based needs such as education?
TOPIC 2: When it comes to U.S. presidential elections, is it true that 78 million of us do not exist? DATE POSTED: 20040806 Back to top Previous topic Next topic TITLE: When it comes to U.S. presidential elections, is it true that 78 million of us do not exist? We Americans have not acknowledged how undemocratic our U.S. presidential election system is. It can be shown that the U.S. presidential election system has the same effect as ignoring 78 million Americans in the 25 most populous states, or 64 million Americans in the 21 most populous states. It ignores more people in the 21 most populous states than the total population of the other 29 states plus Washington D.C. This statistic is explained and illustrated in Figure 1. See the paper and below. To eliminate this undemocratic effect, one solution would be to pass a constitutional amendment to drop the 2 "Senate" electors allocated to each state. But, given the difficulty of passing a constitutional amendment, the state-by-state compact proposed by National Popular Vote now seems a more realistic alternative ( www.nationalpopularvote.com ). See also the discussion of Figures 4 and 6 in Topic 3, above. Click on the TITLE above or on this link for this paper in .htm format: 070114_elector.htm Figure 1. The 21 most populous U.S. states get 70.6% of the U.S. presidential electors for their 77.4% of the U.S. population, a down-weighting to 91.3%. The 29 least populous states plus Washington DC get 29.4% of the electors for their 22.6% of the population, an over-weighting to 129.7%. The ratio of 91% to 129% is 70%. In other words, if the weighting of people in the 29 least populous states plus Washington DC is the unit of comparison, the people in the 21 most populous states are down-weighted by 0.70, with an effective loss of population of 64 million people, a loss of more people than exist in the rest of the country.
TOPIC 1: Interested in how speech works? DATE POSTED: 20070629 Back to top Previous topic About the auther TITLE: From Acoustic Tube to Acoustic Cues Even if you hold your hand in front of your mouth, a person who is listening to you can tell whether you are saying, for example, "/aba/" or "/ada/". To explain how this is possible, we can model the relation between sound propagation in an acoustic tube (e.g. in our mouth) and formant-based acoustic cues for the phonetic dimension of place of articulation. In this paper, following Webster's classic wave equation, we model an acoustic tube as a pressure source exciting an acoustic filter. Starting with this physical model, here are some problems that we can solve: 1. Find the zeroes over frequency of the pressure at the lips, i.e. the formant frequencies. 2. Relate constriction of the uniform tube to changes in formant frequencies. 3. Relate changes in format frequencies to acoustic cues for place of articulation. 4. Relate constriction of other tubes to changes in formant frqeuencies. 5. Relate changes in formant frequencies to rules for speech synthesis that were derived from listening experiments. Figure 5, in the paper and below, shows modeled formant frequency transitions for the first three formants, for six symmetric intervocalic utterances /vowel - constriction - vowel/, and for constriction on each centimeter-long section of a 16-centimeter long, modeled tube. The formant frequency transitions are measured in changes, in Hz, around steady-state formant frequencies for the adjacent vowel. The changes are displayed at the bottom of the graph for the 1st formant, in the middle for the 2nd formant, and at the top for the 3rd formant. The six symmetric intervocalic utterances are /i - constriction - i/, /e - constriction -e/, /õ - constriction - õ/, /schwa - constriction - schwa/, /o - constriction o/, and /a - constriction - a/. A centimeter-long constriction was modeled on each centimeter-long section of the modeled tube representing the vowel. The intervocalic utterances /vowel - b - vowel/ are modeled by a constriction on the centimeter-long section of the tube at the "lips", the one that starts 0 cm back from the opening of the tube. The graph suggests that these utterances are cued by a downward transition from the vowel formant frequency and then a return, for each of the three formants, in response to the constriction and its release. The utterances /vowel - d - vowel/ are modeled by a constriction on the tube section that starts 3 centimeters back from the opening of the tube. The graph suggests that these utterances are cued by a downward transition from the vowel formant frequency and then a return for the 1st formant, by no formant frequency transition for the 2nd formant, and by a strong upward formant frequency transition and return for the 3rd formant, in response to this constriction and its release. A matrix relating pressure and volume velocity to formant frequency changes, and formant frequency changes to place of constriction or perceived place of "articulation" is given in Figure 2 of the paper. A computer program for generating formant frequencies from log area measurements is included in the paper. Click on the TITLE above or on this link for this paper in .pdf format: 070629_tube.pdf Figure 5. Formant transitions superimposed, /V - constriction - V/.
Back to top Previous topic About the author: Dr. Kuhn is a former head of the Adaptive Information and Signal Processing Department of Siemens Corporate Research, and currently President of St Martin Systems, Inc., both located in Princeton, New Jersey.
Regards,

Gary Kuhn, Ph.D.
St Martin Systems, Inc.

Contact us