struct::list(n) | Tcl Data Structures | struct::list(n) |
struct::list - Procedures for manipulating lists
package require Tcl 8.0
package require struct::list ?1.7?
::struct::list longestCommonSubsequence sequence1 sequence2 ?maxOccurs?
::struct::list longestCommonSubsequence2 sequence1 sequence2 ?maxOccurs?
::struct::list lcsInvert lcsData len1 len2
::struct::list lcsInvert2 lcs1 lcs2 len1 len2
::struct::list lcsInvertMerge lcsData len1 len2
::struct::list lcsInvertMerge2 lcs1 lcs2 len1 len2
::struct::list reverse sequence
::struct::list assign sequence varname ?varname?...
::struct::list flatten ?-full? ?--? sequence
::struct::list map sequence cmdprefix
::struct::list mapfor var sequence script
::struct::list filter sequence cmdprefix
::struct::list filterfor var sequence expr
::struct::list split sequence cmdprefix ?passVar failVar?
::struct::list fold sequence initialvalue cmdprefix
::struct::list shift listvar
::struct::list iota n
::struct::list equal a b
::struct::list repeat size element1 ?element2 element3...?
::struct::list repeatn value size...
::struct::list dbJoin ?-inner|-left|-right|-full? ?-keys varname? {keycol table}...
::struct::list dbJoinKeyed ?-inner|-left|-right|-full? ?-keys varname? table...
::struct::list swap listvar i j
::struct::list firstperm list
::struct::list nextperm perm
::struct::list permutations list
::struct::list foreachperm var list body
The ::struct::list namespace contains several useful commands for processing Tcl lists. Generally speaking, they implement algorithms more complex or specialized than the ones provided by Tcl itself.
It exports only a single command, struct::list. All functionality provided here can be reached through a subcommand of this command.
The return value is a list of two lists of equal length. The first sublist is of indices into sequence1, and the second sublist is of indices into sequence2. Each corresponding pair of indices corresponds to equal elements in the sequences; the sequence returned is the longest possible.
As with longestCommonSubsequence, the return value is a list of two lists of equal length. The first sublist is of indices into sequence1, and the second sublist is of indices into sequence2. Each corresponding pair of indices corresponds to equal elements in the sequences. The sequence approximates the longest common subsequence.
To be fully defined the lengths of the two sequences have to be known and are specified through len1 and len2.
The result is a list where each element describes one chunk of the differences between the two sequences. This description is a list containing three elements, a type and two pairs of indices into sequence1 and sequence2 respectively, in this order. The type can be one of three values:
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion = {{deleted {0 0} {-1 0}}
{changed {3 3} {2 2}}
{deleted {6 7} {4 5}}
{added {10 11} {8 8}}}
Notes:
These new chunks describe the parts which are unchanged between the two sequences. This means that the result of this command describes both the changed and unchanged parts of the two sequences in one structure.
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion/Merge = {{deleted {0 0} {-1 0}}
{unchanged {1 2} {0 1}}
{changed {3 3} {2 2}}
{unchanged {4 5} {3 4}}
{deleted {6 7} {4 5}}
{unchanged {8 10} {5 7}}
{added {10 11} {8 8}}}
If there are more variables specified than there are elements in the sequence the empty string will be assigned to the superfluous variables.
If there are more elements in the sequence than variable names specified the subcommand returns a list containing the unassigned elements. Else an empty list is returned.
tclsh> ::struct::list assign {a b c d e} foo bar
c d e
tclsh> set foo
a
tclsh> set bar
b
The subcommand will remove any nesting it finds if the option -full is specified.
tclsh> ::struct::list flatten {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 {8 9} 10
tclsh> ::struct::list flatten -full {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 8 9 10
The command prefix will be evaluated with a single word appended to it. The evaluation takes place in the context of the caller of the subcommand.
tclsh> # squaring all elements in a list
tclsh> proc sqr {x} {expr {$x*$x}}
tclsh> ::struct::list map {1 2 3 4 5} sqr
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> proc projection {n list} {::lindex $list $n}
tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
b 2 f
The script will be evaluated as is, and has access to the current list element through the specified iteration variable var. The evaluation takes place in the context of the caller of the subcommand.
tclsh> # squaring all elements in a list
tclsh> ::struct::list mapfor x {1 2 3 4 5} { expr {$x * $x}
}
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> ::struct::list mapfor x {{a b c} {1 2 3} {d f g}} { lindex $x 1
}
b 2 f
The command prefix will be evaluated with a single word appended to it. The evaluation takes place in the context of the caller of the subcommand.
tclsh> # removing all odd numbers from the input
tclsh> proc even {x} {expr {($x % 2) == 0}}
tclsh> ::struct::list filter {1 2 3 4 5} even
2 4
Note: The filter is a specialized application of fold where the result is extended with the current item or not, depending o nthe result of the test.
The expression will be evaluated as is, and has access to the current list element through the specified iteration variable var. The evaluation takes place in the context of the caller of the subcommand.
tclsh> # removing all odd numbers from the input
tclsh> ::struct::list filterfor x {1 2 3 4 5} {($x % 2) == 0}
2 4
If no variable names are specified then the result of the command will be a list containing the list of passing elements, and the list of failing elements, in this order. Otherwise the lists of passing and failing elements are stored into the two specified variables, and the result will be a list containing two numbers, the number of elements passing the test, and the number of elements failing, in this order.
The interface to the test is the same as used by filter.
The command prefix will be evaluated with two words appended to it. The second of these words will always be an element of the sequence. The evaluation takes place in the context of the caller of the subcommand.
It then reduces the sequence into a single value through repeated application of the command prefix and returns that value. This reduction is done by
tclsh> # summing the elements in a list.
tclsh> proc + {a b} {expr {$a + $b}}
tclsh> ::struct::list fold {1 2 3 4 5} 0 +
15
For "n == 0" an empty list will be returned.
A boolean value will be returned as the result of the command. This value will be true if the two lists are equal, and false else.
Examples:
tclsh> ::struct::list repeat 3 a
a a a
tclsh> ::struct::list repeat 3 [::struct::list repeat 3 0]
{0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeat 3 a b c
a b c a b c a b c
tclsh> ::struct::list repeat 3 [::struct::list repeat 2 a] b c
{a a} b c {a a} b c {a a} b c
A single argument size which is a list of more than one element will be treated as if more than argument size was specified.
If only one argument size is present the returned list will not be nested, of length size and contain value in all positions. If more than one size argument is present the returned list will be nested, and of the length specified by the last size argument given to it. The elements of that list are defined as the result of Repeat for the same arguments, but with the last size value removed.
An empty list will be returned if no size arguments are present.
tclsh> ::struct::list repeatn 0 3 4
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeatn 0 {3 4}
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeatn 0 {3 4 5}
{{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}}
If the -keys is present its argument is the name of a variable to store the full list of found keys into. Depending on the exact nature of the input table and the join mode the output table may not contain all the keys by default. In such a case the caller can declare a variable for this information and then insert it into the output table on its own, as she will have more information about the placement than this command.
What is left to explain is the format of the arguments.
The keycol arguments are the indices of the columns in the tables which contain the key values to use for the joining. Each argument applies to the table following immediately after it. The columns are counted from 0, which references the first column. The table associated with the column index has to have at least keycol+1 columns. An error will be thrown if there are less.
The table arguments represent a table or matrix of rows and columns of values. We use the same representation as generated and consumed by the methods get rect and set rect of matrix objects. In other words, each argument is a list, representing the whole matrix. Its elements are lists too, each representing a single rows of the matrix. The elements of the row-lists are the column values.
The table resulting from the join operation is returned as the result of the command. We use the same representation as described above for the input tables.
The algorithm used here is by Donal E. Knuth, see section REFERENCES for details.
The longestCommonSubsequence subcommand forms the core of a flexible system for doing differential comparisons of files, similar to the capability offered by the Unix command diff. While this procedure is quite rapid for many tasks of file comparison, its performance degrades severely if sequence2 contains many equal elements (as, for instance, when using this procedure to compare two files, a quarter of whose lines are blank. This drawback is intrinsic to the algorithm used (see the Reference for details).
One approach to dealing with the performance problem that is sometimes effective in practice is arbitrarily to exclude elements that appear more than a certain number of times. This number is provided as the maxOccurs parameter. If frequent lines are excluded in this manner, they will not appear in the common subsequence that is computed; the result will be the longest common subsequence of infrequent elements. The procedure longestCommonSubsequence2 implements this heuristic. It functions as a wrapper around longestCommonSubsequence; it computes the longest common subsequence of infrequent elements, and then subdivides the subsequences that lie between the matches to approximate the true longest common subsequence.
This is an operation from relational algebra for relational databases.
The easiest way to understand the regular inner join is that it creates the cartesian product of all the tables involved first and then keeps only all those rows in the resulting table for which the values in the specified key columns are equal to each other.
Implementing this description naively, i.e. as described above will generate a huge intermediate result. To avoid this the cartesian product and the filtering of row are done at the same time. What is required is a fast way to determine if a key is present in a table. In a true database this is done through indices. Here we use arrays internally.
An outer join is an extension of the inner join for two tables. There are three variants of outerjoins, called left, right, and full outer joins. Their result always contains all rows from an inner join and then some additional rows.
We extend all the joins from two to n tables (n > 2) by executing
(...((table1 join table2) join table3) ...) join tableN
Examples for all the joins:
Inner Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} inner join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver}
Left Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} left outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
Right Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} right outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {{} {} 3 driver}
Full Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} full outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
{{} {} 3 driver}
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category struct :: list of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation.
assign, common, comparison, diff, differential, equal, equality, filter, first permutation, flatten, folding, full outer join, generate permutations, inner join, join, left outer join, list, longest common subsequence, map, next permutation, outer join, permutation, reduce, repeating, repetition, reverse, right outer join, subsequence, swapping
Data structures
Copyright (c) 2003-2005 by Kevin B. Kenny. All rights reserved Copyright (c) 2003-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>
1.7 | struct |