Basics
Pnyx is built upon a suite of interfaces, all of which can be found in
pnyx.net.api. These interfaces
can be roughly divided into 4 categories: line, row, object and nameValuePairs.
- Pnyx defines a line as a
String, without line-termination.
- Row is defined as list of nullable strings:
List<String>, without delimiter or line-termination.
- Object is defined as a
Object
- NameValuePairs is defined as a
Dictionary<String, Object?>
In all cases, the formatting is only used when reading and writing to files / streams. As of Pnyx 2.0.1, there is no native support
for streaming nameValuePairs or objects. Drop us a line if you need this functionality.
Internally data is processed independent of the source. For example, CSV and tab delimited files are represented as rows, with
formatting only being used to read from the source. Formatting (CSV vs. Tab), encoding (ASCII vs. UTF), and line-feeds
(Windows vs. Unix) are all used when read and writing, but ignored when filtering and transforming. Formatting, encoding, and
line-feeds are preserved by Pnyx, which frees developers to work on higher level functionality.
Operations
For all categories, there are several operations with built-in support in Pnyx. For each operation, a corresponding interface
is available. When choosing an interface to use, it is best to use the simplest interface that meets your needs, because there are
several higher-level operations for chaining, combining filters. The table below illustrates the core interfaces in Pnyx:
Looking at the table above, the Filter is the simplest operator and Buffering is the most complicated. Filters can be chained together
with boolean operators (and, or, not, xor) and be shimmed (
see Shims). Transforms can shimmed, and Buffering has no special operations
available.
Here is a quick overview of the 3 primary operations:
- Filter - returns true to keep a line/row, and false to ignore. This is by far the simplest operation and is encouraged to
use when a filter fits your needs.
- Transform - returns a line/row/object/nameValuePair, and optionally modifies the content. When
null is returned, the line/row
is ignored. Use a Transform when the line/row must change.
- Buffering - returns a list of lines/rows, or
null to skip. When Pnyx finishes with a data source, Buffering is flushed.
Only use Buffering when the resultant data depends on multiple lines/rows or when output exceeds the input.
Processors
Operators are the part of the API that you will most likely implement. On their own, however, they are not much use. That is where Processors come in.
For each of the 12 interfaces listed above, there is a corresponding Processor, which can be chained together to build a complete "pnyx". Normally the
process of chaining together Processors is handled by the
Fluent API. The rest of the documentation will focus on the
Fluent API, but for completeness, the following code illustrates chaining processors together by hand
(same example as Hello World on library page):
StreamInformation streamInformation = new StreamInformation(new Settings());
// Writes to STDOUT
ILineProcessor dest = new LineProcessorToStream(streamInformation, Console.OpenStandardOutput());
// Grep filter / processor pair
ILineFilter grepFilter = new Grep("world", caseSensitive: false);
LineFilterProcessor grepProcessor = new (grepFilter);
grepProcessor.setNextLineProcessor(dest);
// Sed transformer / processor pair
ILineTransformer sedTransformer = new SedReplace("World", "World, with love from Pnyx..", null);
LineTransformerProcessor sedProcessor = new(sedTransformer);
sedProcessor.setNextLineProcessor(grepProcessor);
// Reads from source
await using (StringStreamFactory streamFactory = new StringStreamFactory("Hello World."))
{
await using (StreamToLineProcessor source = new StreamToLineProcessor(streamInformation, streamFactory))
{
source.setNextLineProcessor(sedProcessor);
// Runs
await source.process(); // All I/O occurs on this step
}
}
// outputs: Hello World, with love from Pnyx...
Shims
A type may implement more than one interface. For instance, a filter may have a slightly
different algorithm for row data vs. line data, and therefore, implements both ILineFilter and IRowFilter. However, it is not
a requirement to implement both row and line versions of the interfaces. For filters and transforms, Line operators can be shimmed to work on row data.
The following example shows how SedReplace, which is a ILineFilter, can be used on CSV data. Notice how the commas used as part of
the CSV formatting are preserved while the comma within the row data is replaced with an underscore.
await using (Pnyx p = new Pnyx())
{
p.readString("CSV,INPUT!,\"Go, Pnyx Go\"");
p.parseCsv();
p.sed("[,!]", "_", "g");
p.writeStdout();
}
// outputs: CSV,INPUT_,"Go_ Pnyx Go"
Next
Suggested next step:
- Fluent, learn how to use the Fluent API