Basics

Pnyx is built upon a suite of interfaces, all of which can be found in pnyx.net.api. These interfaces can be roughly divided into 2 categories: line and row. Pnyx defines a line as a String, without line-termination. Row is defined as list of strings: List<String>, without delimiter or line-termination. In both cases, the formatting is only used when reading and writing to files / streams.

Internally data is processed independent of the source. For example, CSV and tab delimited files are represented as rows, with formatting only being used to read from the source. Formatting (CSV vs. Tab), encoding (ASCII vs. UTF), and line-feeds (Windows vs. Unix) are all used when read and writing, but ignored when filtering and transforming. Formatting, encoding, and line-feeds are preserved by Pnyx, which frees developers to work on higher level functionality.

Operations

For both lines and rows, there are several operations with built-in support in Pnyx. For each operation, a corresponding interface is available. When choosing an interface to use, it is best to use the simplest interface that meets your needs, because there are several higher-level operations for chaining, combining filters. The table below illustrates the core interfaces in Pnyx:

Looking at the table above, the Filter is the simplest operator and Buffering is the most complicated. Filters can be chained together with boolean operators (and, or, not, xor) and be shimmed (see Shims). Transforms can shimmed, and Buffering has no special operations available.

Here is a quick overview of the 3 primary operations:

Filter - returns true to keep a line/row, and false to ignore. This is by far the simplest operation and is encouraged to use when a filter fits your needs.
Transform - returns a line/row, and optionally modifies the content. When null is returned, the line/row is ignored. Use a Transform when the line/row must change.
Buffering - returns a list of lines/rows, or null to skip. When Pnyx finishes with a data source, Buffering is flushed. Only use Buffering when the resultant data depends on multiple lines/rows or when output exceeds the input.

Processors

Operators are the part of the API that you will most likely implement. On their own, however, they are not much use. That is where Processors come in. For each of the 6 interfaces listed above, there is a corresponding Processor, which can be chained together to build a complete "pnyx". Normally the process of chaining together Processors is handled by the Fluent API. The rest of the documentation will focus on the Fluent API, but for completeness, the following code illustrates chaining processors together by hand (same example as Hello World on library page):

StreamInformation streamInformation = new StreamInformation(new Settings());
    
// Writes to STDOUT
ILineProcessor dest = new LineProcessorToStream(streamInformation, Console.OpenStandardOutput());

// Grep filter / processor pair
ILineFilter grepFilter = new Grep {textToFind = "world", caseSensitive = false};
ILineProcessor grepProcessor = new LineFilterProcessor { filter = grepFilter, processor = dest };

// Sed transformer / processor pair
ILineTransformer sedTransformer = new SedReplace("World", "World, with love from Pnyx..", null);
ILineProcessor sedProcessor = new LineTransformerProcessor { transform = sedTransformer, processor = grepProcessor };

// Reads from source
using (StringStreamFactory streamFactory = new StringStreamFactory("Hello World."))
{
    using (StreamToLineProcessor source = new StreamToLineProcessor(streamInformation, streamFactory))
    {
        source.setNextLineProcessor(sedProcessor);            

        // Runs 
        source.process();                // All I/O occurs on this step
    }
}

// outputs: Hello World, with love from Pnyx...

Shims

A type may implement more than one interface. For instance, a filter may have a slightly different algorithm for row data vs. line data, and therefore, implements both ILineFilter and IRowFilter. However, it is not a requirement to implement both row and line versions of the interfaces. For filters and transforms, Line operators can be shimmed to work on row data. The following example shows how SedReplace, which is a ILineFilter, can be used on CSV data. Notice how the commas used as part of the CSV formatting are preserved while the comma within the row data is replaced with an underscore.

using (Pnyx p = new Pnyx())
{
    p.readString("CSV,INPUT!,\"Go, Pnyx Go\"");
    p.parseCsv();
    p.sed("[,!]", "_", "g");  // replace ,! with _
    p.writeStdout();
}                        
// outputs: CSV,INPUT_,"Go_ Pnyx Go"

Suggested next step:

Fluent, learn how to use the Fluent API

Pnyx Libary
Basics
Fluent
- Reference
- State Machine
- Line
- Row
- Input
- Output
- Settings
Examples
Utilities
Standards