Getting Started

Lingk's Spark Integration Recipe Engine (SIRE) enables flexible data integration by configuration. SIRE is driven by YAML-based recipes that combine connectors with Spark SQL data processing statements.

YAML Resources

If you are not familiar with YAML, see their official web site: http://yaml.org/. Basically, in YAML, a document's structure is shown through indentation (one or more spaces). Sequence items are denoted by a dash (each connector begins with a dash), and key-value pairs within a map are separated by a colon.

YAML Tutorial

Handling Multiple Lines

Online YAML Parser

About Recipes

Recipes consist of connectors and statements. Recipes are triggered by file, data and API events, or time-based schedules.

Integration recipes enable you to build re-usable integrations that contain complex orchestrations. The Lingk Recipe Language is YAML-based and consists of the following parts:

  1. Connectors (required)
  2. Pre- and Post-processors (optional)
  3. Formats (optional)
  4. Statements (required)

Quick Example

Even without a full-blown integration or complete knowledge of what you may want to do with Lingk yet, you can get started with our Recipes. Below is a simple Recipe that does not require any external connections to data systems outside of the Lingk platform. The example demonstrates how to use JSON input and SIREs Spark SQL Engine to read input data and output the data in the way you would if performing system integration. The reason we chose the JSON connector to show you how to get started is because it allows you to take a static JSON structure and edit it on the fly.

connectors:

-
    name: example
    type: json
    properties:
      jsonObject: >
       [
         { "courseId": "CIT163", "creditHours": 3, "title": "Intro to Programming w/C++"},
         { "courseId": "CIT313", "creditHours": 3, "title": "Web Programming II" },
         { "courseId": "CIT416", "creditHours": 3, "title": "Advanced Web Programming" },
         { "courseId": "CIT410", "creditHours": 3, "title": "E-Commerce" }
       ]

statements:
- statement: (courses) => select * from example
- statement: print courses

The example above simply takes a JSON object of arrays as input, gives it a name that will be internally used by the engine (example) and then in the statements section, uses Spark SQL to query that connector (example), assigning the results to an internal structure named courses. Then, our second statement outputs the results fo the Spark select statement in the following format:

https://cdn.elev.io/file/uploads/Cp0J0dUMUsMBueML5df1_VqgYbCEmN3yugKstAPZmWo/Ejh8r48dVD6haZMdZ6HF6VxqoOYJG-IR3bNnqLQWpIo/selectstatementformat-bX8.png


Additional Lingk Recipe Example - SFTP to SFTP

Note: replace all bracketed "[]" fields with your data/information. If you get an "Array error" when processing the recipe, this is because YAML sees the brackets as an array.

connectors:

-
    name: courseReader
    type: sftpReader
    format: pipe
    properties:
      user: [username]
      password: [password]
      host: my.ftp.com
      path: /course_1.csv

-
    name: courseWriter
    type: sftpWriter
    format: pipe
    properties:
      user: [username]
      password: [password]
      host: my.ftp.com
      path: /archive/course_{{ func:date_format(func:current_datetime(), 'yyyy-MM-dd_HH-mm-ss') }}.csv


formats:
-
    name: pipe
    type: delimited
    properties:
      quoteAllFields: true
      delimiter: '|'
      header: true

statements:
  - statement: (courses) => select * from courseReader
  - statement: print courses
  - statement: insert courses into courseWriter

Interpreting the Recipe Example Above

Instead of explaining the Recipe above lined by line, look at the statements for the Recipe at the bottom of the list. The statements basically say, read everything from courseReader, show the contents of courseReader on the screen, then output those records to the courseWriter.

How does this work? The recipe is configured to open an SFTP connection to my.ftp.com with the credentials you supply, reading a CSV file (course_1.csv) from that SFTP server. This "connection" to the SFTP server is given the name "courseReader" and is of type "sftpReader" within the connectors section. Because you are reading a .csv file, the formats section is required and provides details on how that file is configured (pipe delimited, quoted fields, with a header row). The "sftpWriter" connector is named "courseWriter" and opens a connection to the same SFTP server (does not have to be the same one) and outputs contents to the path given in the sftpWriter settings.

From the statements section, the first statement gives a name (courses) to the statement that follows it, which is "select * from courseReader". courseReader is the .csv file in the sftpReader section. This statement serves to read all columns and rows from that .csv file. The second statement, "print courses" will output the results read from the file to the "Event Context" window of the editor. The final statement, "insert courses into CourseWriter" acts as a process to copy what was read by the courseReader into the file opened by the courseWriter.

Executing a Recipe

Once you have created a Recipe, save the Recipe by clicking the "Save" button. If you are using environment credentials in your recipes, set up your environment before executing Recipes.

Once you have made your environment selection, you can execute it by clicking the "Run" button at the top of the editor window and then click the "Start Recipe" button that appears on the Event Context window at the bottom of the editor window. If your Recipe contains errors, they will be presented within the Execution Log.