facebook tracking

A case for streaming endpoints

By Nicolás Palumbo, Senior Software Developer
iStock-1046046242.jpg


Let’s develop our understanding first, with a practical example of what it means to stream results.

For this I have prepared a spring boot web application that performs sentiment analysis of the whole Moby Dick book. 

You will find my code here.

The app consists of 2 rest endpoints, namely /blocking and /streaming.

Both endpoints iterate through all the phrases of the book and run sentiment analysis by breaking down the phrases into sentences and returning sentiments for each sentence e.g. Neutral, Negative, Positive.

For sentiment analysis I’ve used the Sentiment annotations of Stanford Core NLP library.

The application reads the Moby Dick book from a file, split it in phrases and performs sentiment analysis. A single phrase looks like the following one:

 “Morning to ye! morning to ye!” he rejoined, again moving off. “Oh! I was going to warn ye against—but never mind, never mind—it’s all one, all in the family too;—sharp frost this morning, ain’t it? Good-bye to ye. Shan’t see ye again very soon, I guess; unless it’s before the Grand Jury.” And with these cracked words he finally departed, leaving me, for the moment, in no small wonderment at his frantic impudence.”

Each phrase is transformed then to a single json element, with the original text and a list of the sentences within the phrase, each one qualified with a sentiment. The job is finished when all phrases in the book are processed and returned as a list in JSON format.


{

"originalText": "“Morning to ye! morning to ye!” he rejoined, again moving off. “Oh! I\nwas going to warn ye against—but never mind, never mind—it’s all one,\nall in the family too;—sharp frost this morning, ain’t it? Good-bye to\nye. Shan’t see ye again very soon, I guess; unless it’s before the\nGrand Jury.” And with these cracked words he finally departed, leaving\nme, for the moment, in no small wonderment at his frantic impudence.",
"analysedSentences": [
{
"sentence": "“Morning to ye!",
"sentiment": "Neutral"
},
{
"sentence": "morning to ye!”",
"sentiment": "Neutral"
},
{
"sentence": "he rejoined, again moving off.",
"sentiment": "Positive"
},

{
"sentence": "And with these cracked words he finally departed, leaving\nme, for the moment, in no small wonderment at his frantic impudence.",
"sentiment": "Negative"
}
]
}


Doing sentiment analysis is computationally expensive, it takes more than 1 hour to do it for the whole Moby Dick book on my machine, although the way it reads phrases is not the most efficient one, it is helpful to illustrate the point.

In the example below, both endpoints process the first 5 phrases of the book. The streaming version though starts downloading early on, while the blocking endpoints hast to process all elements before returning.

json 1.gif

json2.gif


Why choosing a streaming endpoint over a blocking one?


A streaming endpoint frees memory more frequently, it yields results as soon as they are available, if there is a failure at least part of the content will be returned, on the contrary a blocking solution will not return any results until the very last element has been processed.

To conclude, I’d have liked to used an example that more accurately highlights the advantages of streaming in terms of memory usage, but hopefully the principle and my thought process comes across.


What technologies do you use in your streaming data pipelines?  Feel free to drop me a Tweet.

Already working at loveholidays?

Let’s recruit together and find your next colleague.

email
@loveholidays.com
  • Alex Francis
  • Bruno Trathaug
  • Chris McCavert
  • David Webb
  • John Lucas
  • Michael Farry
  • Michael Jones
  • Raphael Frascogna
  • Steve Clift
Teamtailor

Career site by Teamtailor