Setup an Experiment

Four main steps must be completed to run your own experiment in eDom:

  1. Prepare the definition files.
  2. Create a file containing the full list of words to be normed in the experiment (the population list)
  3. (Optional) If participants will only be rating a subset of the words in the population list, prepare sample lists from that population for each participant.
  4. Create a configuration file that specifies several parameters eDom needs to run.

Preparing the Definition Files

Definitions for each meaning of an ambiguous word must be stored in separate text files on the basis of the conventions outlined below.  

Note that these conventions originate from the spider and parser tools that were used to collect relevant information from the Wordsmyth.net website; these data were in turnpresented to participants in the eDom norming experiment reported in Armstrong, Tokowicz, and Plaut (2012).  Consequently, researchers who wish to use these same definitions in their study may find the original Wordsmyth.net spider and parsing tool used to generate the definitions used in the Armstrong, Tokowicz, and Plaut (2012) study relevant.  This tool was written in an older version of python (2.4.x) and involves 1) scraping (spidering) the site for definitions and then 2) parsing these definitions to extract the information needed for a study.  Insofar as the Wordsmyth site has changed since this tool was originally developed, some minor or major modifications of this code may be required.  

Alternatively, a pool of parsed word definitions may be obtained by contacting Blair Armstrong (see contact info) and providing a list of target items. 

File conventions

The mined definitions were stored in files with the following filename structure, also used by eDom:

<word>.singleword.<Meaning #>.<part of speech>.txt

For example:

bank.singleword.1.noun.txt

In practice, eDom only cares that the file name is made up of 5 period-separated segments and ignores everything except for the <word> and <meaning #> fields.  The only exception are files that end in ".all.txt", which are ignored by eDom.  

Note that you do not need to place all of the definitions associated with a particular meaning into a single file; you could for instance separate the different parts of speech which are all associated with the same meaning into different files.  eDom will automatically merge all of the definitions associated with a particular meaning number into a single list during norming.  For instance, for files:

bank.singleword.1.noun.txt
bank.singleword.1.verb.txt
bank.singleword.2.noun.txt

All of the meanings listed in the two files with meaning #1 would be combined and would be presented separately from meaning #2.  Note that the order of the definitions within each file are always preserved and are not randomized in the current version of eDom.  

Note that the order of the meaning numbers is not arbitrary but should be based on estimates of which meaning is dominant and which is/are the subordinate meaning/s.  This is the standard ordering of an ambiguous word's meanings in the dictionary, and this ordering is used by eDom to generate scores that are suitable for running a sign test which evaluates whether meaning #1 was the most dominant meaning in the empirical ratings (for details, see eDom Output).

It is also important that you follow the sequence of 1...2...3...n without skipping any meaning numbers, as a missing meaning in a sequence will generate an error.  eDom currently supports rating up to 6 meanings per word.  

All of the definition files should be placed within a single directory.  Although the location of this directory may be user-specified, by convention it should be placed in ./eDom/input/definitions/.

Example definition files are included in the demo norming experiment and are stored in the above-mentioned folder.

Create a Population List

A list of all of the words that will be normed in the experiment (but not necessarily by each participant) should be stored in a text file.  These words are each stored on a separate line.  By convention, this file should be stored as:./eDom/input/<populationName.pop.txt>.

For example, the demo experiment contains the population file ./eDom/input/testPop.pop.txt.  

Prepare Sample Lists for Each Participant (optional)

If there are more definitions in the population than can be normed by a single participant, it may be necessary to have each participant norm a subset of the total population of definitions.  A helper script is included with eDom that helps generate sets of lists that ensure equal sampling of each word in the population across a set of participants.  

To use this script,  Python (version >= 3.0) must be installed on your computer (to allow users of the standalone version to take advantage of this functionality).  Instructions for downloading, installing, and using python are available at www.python.org.   

Once python has been installed, you will need to edit ./eDom/randomized_lists/randomizedLists.py based on your particular set of words.  This can be done via a text editor like notepad, or via an IDE (e.g., PyScripter, from code.google.com/p/pyscripter).

There are four user parameters which must be set at the beginning of the file.  The first two specify the location of the population list and the location where the participant sample lists should be saved (this last directory can probably remain unchanged).

The third parameter is the maximum length of each list (listLen).  A set of samples is created by randomly sampling from the population list without replacement until either the population list is empty or this maximum length value is reached.  Once the population list has been emptied the population will be reset for the next sample.  

To ensure approximately equal sample sizes for all participants, it is useful to set the list length to <avg # words a participant should rate> -1, so that the when the last sample from a population is created before it is depleted, that participant is left with approximately the same number of words to rate as the other participants.

The final parameter which must be set is the total number of sample participant lists (perm) that should be created.  To ensure equal sampling of all words, this should be set to a multiple of  <#words in population>/<avg # words rated by each participant>.  To avoid having to generate additional sample lists at a later time, it is typically useful to generate more lists than are planned to be needed.

Finally, the script must be run by running the python binary and providing the randomizedLists.py filename as an argument.  For insance:

python ~/eDom/randomized_Lists/randomizedLists.py

See the python.org website for additional information on running the python script.

Creating a Configuration File

Finally, a number of parameters must be stored in a configuration file, typically ./eDom/input/config.conf.  These parameters are:

  1. The name of the folder containing the definition files
  2. The sampling method to use (either 'population' if each participant should rate all of the words, or 'participantList' if each participant will be rating a subset of the words (see 'creating sample lists for each participant', above).
  3. Either the filename of the population list if 'population' mode is used in #2, or the name of the folder that contains the participant lists if 'participantList' is used.  If a folder name is specified it must have a trailing slash.
  4. The order in which words should be presented from either the population or participantLists.  (either 'random', which shuffles the list, or 'fixed' to use the ordering of the list as-is).
  5. A flag specifying whether norms can be collected for words for which no definitions have been provided 'allowNoDefs' will present words without requiring that any definitions are provided for those words.  This allows for a word's definitions to be collected from scratch.  'denyNoDefs' will cause an error if there are no definitions associated with a word and can help prevent inadvertantly failing to include a definition for a word.
  6. The number of words to rate between breaks (must be an integer > 0).  
  7. The number of extra empty 'definition slots' that a participant can use to supply their own definitions for a word.  A minimum value of 2 is recommended as this ensures that at least the two definitions that account for the largest portion of a word's variance are included in the norms.  Note that the current version of eDom is limited to a total of 6 definition slots across both the supplied meanings of a word and the empty definition slots, and the inclusion of pre-specified definitions supercedes the inclusion of empty definition slots.  

For example, the demo experiment's configuration file, ./eDom/input/config.conf, contains the following:

./input/definitions/
population
./input/testPop.txt
random 
allowNoDefs
3
2


Once these steps have been completed, you are ready to Run eDom.


Blair Armstrong, Natasha Tokowicz, David Plaut, 2011-