Build a recipe

Question

Build a recipe

Accepted Answer

Lingk's Spark Integration Recipe Engine (SIRE) enables flexible data integration by configuration. SIRE is driven by YAML-based recipes that combine connectors with Spark SQL data processing statements.Need a refresher on the structure of a Lingk recipe? Read thisBasically, in YAML, a document's structure is shown through indentation (one or more spaces). Sequence items are denoted by a dash (each connector begins with a dash), and key-value pairs within a map are separated by a colon.YAML structure These guides  (YAML Tutorial, Handling Multiple Lines, Online YAML Parser) help explain how YAML works. You can skip this section if you already are familiar with YAMLQuick ExampleEven without a full-blown integration or complete knowledge of what you may want to do with Lingk yet, you can get started with our Recipes. Below is a simple Recipe that does not require any external connections to data systems outside of the Lingk platform. The example demonstrates how to use JSON input and SIREs Spark SQL Engine to read input data and output the data in the way you would if performing system integration. The reason we chose the JSON connector to show you how to get started is because it allows you to take a static JSON structure and edit it on the fly.Please copy and paste this in a new recipe window:connectors:
-
    name: example
    type: json
    properties:
      jsonObject: >
       [
         { "courseId": "CIT163", "creditHours": 3, "title": "Intro to Programming w/C++"},
         { "courseId": "CIT313", "creditHours": 3, "title": "Web Programming II" },
         { "courseId": "CIT416", "creditHours": 3, "title": "Advanced Web Programming" },
         { "courseId": "CIT410", "creditHours": 3, "title": "E-Commerce" }
       ]

statements:
- statement: (courses) => select * from example
- statement: print coursesThe example above simply takes a JSON object of arrays as input, gives it a name that will be internally used by the engine (example) and then in the statements section, uses Spark SQL to query that connector (example), assigning the results to an internal structure named courses. Then, our second statement outputs the results of the Spark select statement in the following format:Additional Lingk Recipe Example - SFTP to SFTPNote: replace all bracketed "[]" fields with your data/information. If you get an "Array error" when processing the recipe, this is because YAML sees the brackets as an array.connectors:

-
    name: courseReader
    type: sftpReader
    format: pipe
    properties:
      user: [username]
      password: [password]
      host: my.ftp.com
      path: /course_1.csv

-
    name: courseWriter
    type: sftpWriter
    format: pipe
    properties:
      user: [username]
      password: [password]
      host: my.ftp.com
      path: /archive/course_{{ func:date_format(func:current_datetime(), 'yyyy-MM-dd_HH-mm-ss') }}.csv

readFormats:
-
    name: pipe
    type: delimited
    properties:
      quoteAllFields: true
      delimiter: '|'
      header: true

statements:
  - statement: (courses) => select * from courseReader
  - statement: print courses
  - statement: insert courses into courseWriterInterpreting the Recipe Example AboveInstead of explaining the Recipe above lined by line, look at the statements for the Recipe at the bottom of the list. The statements basically say, read everything from courseReader, show the contents of courseReader on the screen, then output those records to the courseWriter.How does this work? The recipe is configured to open an SFTP connection to my.ftp.com with the credentials you supply, reading a CSV file (course_1.csv) from that SFTP server. This "connection" to the SFTP server is given the name "courseReader" and is of type "sftpReader" within the connectors section. Because you are reading a .csv file, the readFormats section is required and provides details on how that file is configured (pipe delimited, quoted fields, with a header row). The "sftpWriter" connector is named "courseWriter" and opens a connection to the same SFTP server (does not have to be the same one) and outputs contents to the path given in the sftpWriter settings.From the statements section, the first statement gives a name (courses) to the statement that follows it, which is select * from courseReader. courseReader is the .csv file in the sftpReader section. This statement serves to read all columns and rows from that .csv file. The second statement, print courses will output the results read from the file to the "Event Context" window of the editor. The final statement, "insert courses into courseWriter" acts as a process to copy what was read by the courseReader into the file opened by the courseWriter.Go furtherLearn the core concepts of Lingk and how it all works.Upload an Excel, JSON, or CSV file to the Local File connector in your environment. (See this guide)Automate data processes by schedule or by event.Browse recipes in the public recipe library.Attend the free Lingk Certification Course