TIES basic complete run (rev A)

Community rating: ?????

This trains and tests the full TIES coupled-oscillator model of emotional interaction. Using cross-validation, this learns shared parameters for the oscillator and tests those models. Documentation for the underlying tool is at http://www.compties.org/

Quick Start

Input Data

The input data file contains measurements and other information about participants, who are paired into dyads (pairs, couples).  The file contains both time-varying and time-invariant data.

The input data file is organized as a table of numbers, stored in comma-separated-variable (CSV) format.  (Spreadsheets such as Excel and OpenOffice can export data in this form.)  The first row stores text labels, and rows 2 and below hold numeric data for the participants.  Each row stores information about one participant at one moment in time. Each column stores one category of data.

The text labels in the first row describe the data categories.  Each entry in rows 2 and below should either be a number, or the special string "NA" to indicate a missing datum.  Two column labels are mandatory:  Dyad, and time.

Required column labels (and meanings) 

Two of the columns must have the exact labels below.  The labels are case-sensitive.

Other required columns

At least two more columns are required, but the labels are up to you:

For more discussion about the differences between moderator and grouping-variable factors, please see below, Moderator or Grouping Variable?

Extra columns are ignored, and thus you may store all your measurements in one file.  Unreferenced columns have no effect on the model.

Every column in the file should have a row-1 label composed of solely of letters, numbers, and underline characters.  Do not use spaces, punctuation, or other characters.  Each label must be unique.

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> ties_basic_complete_run
Input File(s)

The fictional example below shows the format.

Dyad

is_mother

age_years

conflict

resp_rate

dial

time

2

0

17.3

4.2

0.84

0

1

2

0

17.3

4.2

0.72

0.2

2

2

0

17.3

4.2

0.70

-0.4

3

2

0

17.3

4.2

0.83

-0.1

4

2

0

17.3

4.2

0.87

0.3

5

3

0

19.4

1.1

0.50

0.3

1

3

0

19.4

1.1

0.55

0.3

2

3

0

19.4

1.1

0.57

0.4

3

3

0

19.4

1.1

0.55

0.2

4

3

0

19.4

1.1

0.56

-0.1

5

3

0

19.4

1.1

0.55

-0.1

6

7

0

16.7

2.3

0.70

-0.3

1

7

0

16.7

2.3

0.77

-0.2

2

7

0

16.7

2.3

0.80

-0.1

3

7

0

16.7

2.3

0.78

0.2

4

2

1

39.2

4.2

0.73

0.1

1

2

1

39.2

4.2

0.74

0.2

2

2

1

39.2

4.2

0.73

0.2

3

2

1

39.2

4.2

0.72

0.1

4

2

1

39.2

4.2

0.73

0.0

5

3

1

44.6

1.1

0.68

-0.4

1

3

1

44.6

1.1

0.69

-0.5

2

3

1

44.6

1.1

0.65

-0.4

3

3

1

44.6

1.1

0.60

0.2

4

3

1

44.6

1.1

0.58

0.3

5

3

1

44.6

1.1

0.62

0.1

6

7

1

41.1

2.3

0.77

-0.2

1

7

1

41.1

2.3

0.80

-0.1

2

7

1

41.1

2.3

0.78

0.2

3

7

1

41.1

2.3

0.75

0.1

4

If you open the CSV file in a text editor, the first few lines of the above example would look something like this:

 

"Dyad","is_mother","age_years","conflict","resp_rate","dial","time"
2,0,17.3,4.2,0.84,0,1
2,0,17.3,4.2,0.72,0.2,2|

App Parameters

Before you launch the application, you will be prompted for several items:

Section 1 contains the prompts below:

Moderator or Grouping Variable?

If you are testing the hypothesis that some known per-individual time-invariant factor can help explain the observed oscillations in your data, then you should use either a moderator or a grouping variable. But which one?

A grouping variable is intended for discrete values that describe categories lacking a natural order.   For example, if you hypothesize that an individual's first language helps explain your data, you might record categorical values for each individual indicating 0=Cantonese, 1=Catalan, 2=Korean, 3=Urdu, etc.   The order of these numbers is meaningless: whatever the "average" of Catalan and Urdu might be, it is not Korean, regardless of the fact that Average{1, 3} = 2.   Although this fact is obvious to a human interpreter, it would not be obvious to the computer -- it must be told.  By indicating that 1st_language is a grouping variable, you explicitly tell the TIES modeler not to rely on order properties.

The TIES modeler uses grouping variables to segregate the data, and then it infers independent oscillator models for each group.  All else being equal, data with fewer groups or more individuals per group will yield results with better significance.  Of course a grouping variable must assume at least two values to have any explanatory power.

A moderator is intended for numerical values that have a meaningful natural order.  Examples: age, body-mass index, number of siblings.  A moderator category might assume discrete values, but the order of the values naturally has meaning.  For example, if body-mass index truly helps predict good oscillator parameters, then two individuals with BMIs of 30 and 31 (ceteris paribus) will have oscillator parameters more similar to each other than to those of an individual with BMI of 20.

The TIES modeler uses moderators in a linear regression model, either to determine oscillator parameters, or (if the parameters are stochastic) to determine the distributions of the oscillator parameters.

Output

The analysis creates an output folder, using the name specified at launch-time.  Inside there are subfolders for the inference results, and with baseline models for comparison.

Results from the TIES model

There is a subdirectory named shared-param-CLO which stores all the results of the training and testing. The errors subfolder shows fitting error.

File err-couples.txt

This file, in the errors subfolder, contains the RMS fit error between the data and the oscillator outputs, for each dyad in the input, when it is used for testing (not training).

This file can be useful for diagnosing problems. One can see if there are dyads with data that never fit well. This might mean the data are outliers, or corrupted somehow. Person-0 represents the dyad member with zero-value distinguisher, and Person-1 represents the dyad member with one-valued distinguisher. This file shows errors during the fitting (the early 80% of the data) in two columns, and during the prediction (the late 20% of the data). A quality fit will have low prediction error.

File err-summary.txt

This file represents the average of the columns of err-couples.txt – that is, it shows the RMS fitting error averaged across time and across couples.

Results from baseline models

There are more subdirectories containing similarly-organized results for the three baseline models (flat average value, straight line fit, and independent coupled oscillator). The error results are found in error/err-couples.txt and error/err-summary.txt with the same interpretation as the results in shared-param-CLO (see above).

Interpretation of results

(fill in more later.) Basic story: as the baseline models get more sophisticated (average is simplest, line-fit is intermediate, independent-CLO is the most sophisticated), the fit gets better but the predictions get worse. By taking a Bayesian approach and introducing (and learning) a prior distribution over oscillator characteristics, the TIES model predicts better than any of the baseline models.