State Machine
Internally,
Pnyx
has 4 primary states that control which method calls are legal.
States are transitioned based upon specific methods (like read, write, parse, print, etc). Each
state restricts methods, and will throw an
IllegalStateException
when invalid. The
following diagram illustrates the primary states and their transistions.
Use Once
Once familiar with the fluent API, the usage will be natural. However, their are two non-obvious
design limitations. The first is simply that a Pnyx
can only be used once. If your
project needs to reuse Pnyx objects, then it is recommend to create a factory method for building
new instances.
Input / Output
When looking at the state machine, it is important to note that all
I/O reads and writes are done in the
process state. All other steps simply build
filters,
transformers,
bufferings and
processors which are used during the
process state.
Single Input and Output
The other non-obvious limitation is the assumption of only 1 input and 1 output. This limitation is
more a matter of style than a true limitation. Multiple inputs are accommodated via the
cat method, and multiple outputs are accommodated via the
tee method. Since the most common usage will be single
inputs / outputs, the API was written to enforce the most common usage, with explicit methods for
multiple inputs.
The following examples shows how to use
cat for reading from multiple sources, including how
to read as CSV.
using (Pnyx p = new Pnyx())
{
p.cat(pn =>
{
pn.readString("Line one");
pn.readString("Line two");
pn.readString("Line three");
// ...
});
p.writeStdout();
}
// outputs:
// Line one
// Line two
// Line three
using (Pnyx p = new Pnyx())
{
p.cat(p2 =>
{
p2.asCsv(p3 =>
{
p3.readString("Line,one");
p3.readString("Line,two");
p3.readString("Line,three");
});
});
p.selectColumns(2, 1);
p.writeStdout();
}
// outputs:
// one,Line
// two,Line
// three,Line
Tee
In addition to cat, Pnyx has support for multiple outputs with the tee processor. The tee processor
internally creates a separate Pnyx object, which can be used for writing additional outputs. Since a separate Pnyx object
is used, the tee processor can also perform separate operations on the data to create different outputs. Finally, the tee method
does not change the state to End, and therefore, additional operations can be performed after the tee.
The example below shows using a tee to split a CSV by column to create 2 separate output files.
using (Pnyx p = new Pnyx())
{
p.readString("1975,218M,\"Love Will Keep Us Together\"\n");
p.parseCsv();
p.tee(p2 =>
{
p2.selectColumns(1, 2);
p2.write("us_population_by_year.csv");
// outputs: 1975,218M
});
p.selectColumns(1, 3);
p.write("top_songs_by_year.csv");
// outputs: 1975,"Love Will Keep Us Together"
}
A subtle, but powerful, side effect the tee processor is that original state of the Pnyx is unchanged, and therefore, can
be used for addition operations, including additional tees. The following example shows using 2 tee processors.
using (Pnyx p = new Pnyx())
{
p.readString("clientId: 123456\n");
p.tee(p2 =>
{
p2.write("copy.txt");
// outputs: clientId: 123456
});
p.parseDelimiter(": ");
p.selectColumns(2);
p.tee(p2 =>
{
p2.write("ids.txt");
// outputs: 123456
});
p.print("delete from client where id = $0;");
p.writeStdout();
// outputs: delete from client where id = 123456;
}
Next
Suggested next steps:
- Line, learn more about Line operations
- Row, learn more about Row operations