Do big data stream processing in the stream way

ByEric Ma Nov 27, 2018Nov 21, 2019

Reading: Years in Big Data. Months with Apache Flink. 5 Early Observations With Stream Processing: https://data-artisans.com/blog/early-observations-apache-flink.

The article suggest adopting the right solution, Flink, for big data processing. Flink is interesting and built for stream processing.

The broader view and take away may be to solve problems using the right solution. We saw many painful tries in history and in current practices still: do huge large scale data in traditional databases, do unstructured data processing in relational database, do graph processing in tables way, do stream processing in micro-batch way and etc. The specific problem should be handled by a solution built for that problem and that solution can be the most efficient and convenient one.

Some good examples and points from the article.

“In reality, however, processing data with as low latency as possible has been a challenge for a long time….a customer asked me how to produce an up-to-date aggregation over a tumbling five-minute window of a growing table using Hive.”

“the customer and business user really need: a representation of data as a stream and the ability to do in-stream complex/stateful analytics. ”

“Customers and end-users wrangle with the latency gap in all kinds of interesting and expensive ways.”

“it’s refreshing to be given constructs of stream, state, time and snapshots as the building blocks of event processing rather than incomplete concepts of keys, values, and execution phases.”

“The first approach is to use batch as a starting point then try to build streaming on top of batch. This likely won’t meet strict latency requirements, though, because micro-batching to simulate streaming requires some fixed overhead–hence the proportion of the overhead increases as you try to reduce latency.”

“However we asked ourselves if the data is being generated in real-time, why must it not be processed downstream in real-time?”

“requirements around low latency processing and complex analysis cannot be met in an inexpensive, scalable and fault-tolerant way.”

How to embed a Map image from Google Map in a website?

ByEric Ma Mar 24, 2018Mar 24, 2018

Google Map has APIs. But the requirement is that the Map image should be static and uploaded to the website only. No request or dependency on Google’s website should be needed so that the website can run without Internet. Is it possible or allowed (like a screenshot of the Google Map)? If not, any suggestions…

Linux | Tutorial

How to Install ATI fglrx Driver on Fedora Linux

ByEric Ma Sep 25, 2013Aug 30, 2020

Update on Nov. 18, 2012: ATI fglrx driver works well on Fedora 17 with GNOME 3 Shell. Should work well with later releases. Great driver, ATI! Update on Nov. 29, 2011: ATI fglrx driver works on Fedora 16 with GNOME 3 Shell with Catalyst driver 11.11 (xorg-x11-drv-catalyst-11.11). Update on Oct. 9, 2011: GNOME 3 shell…

How to convert the dmesg timestamps to easier to read format on Linux?

ByEric Ma Mar 24, 2018Mar 24, 2018

The dmesg results from newer Linux kernels show the timestamps. It seems the time in seconds since the kernel start time. How to convert the dmesg timestamps to the real time on Linux? The dmesg timestamp is the time in seconds since the kernel starting time. Later dmesg has an -T option: -T, –ctime Print…

How to detect whether a file is readable and writable in Python?

ByEric Ma Mar 24, 2018Mar 24, 2018

Before reading or writing a file, access should be checked first. How to detect whether a file is readable and writable in Python? You can use the os.access(path, mode) library function https://docs.python.org/release/2.6.6/library/os.html#os.access like the Linux access library function for C. It returns True if access is allowed, False if not. For readable and writable, you…

Virtualization

Automatically Backing Up Xen File-backed DomU

ByEric Ma Jul 13, 2013Aug 23, 2020

A script for backing up file-backed Xen DomU is introduced in this post. This script can be changed to similar platform. In our cluster, virtual machines are stored under /lhome/xen/. Virtual machine with id vmid is stored in directory vmvmid. The raw image disk file name can also be derived from vmid. Some more details…

Virtualization

How to Duplicate Xen DomU Virtual Machines

ByEric Ma Jul 13, 2013Aug 23, 2020

Assumption: There are VBD based Xen DomU virtual machines stored under /home/xen/vm-f11-sample/. There are two files under vm-f11-sample: vm0-f11.run (The configuration file) and vmdisk0 (The virtual disk). Now we want to duplicate the virtual machine vm0 stored under vm-f11-sample to vm-10.0.0.213 which is stored under vm-10.0.0.213. And vm-10.0.0.213’s ip will be 10.0.0.213. The steps to…

One Comment

Yassir says:

May 7, 2019 at 12:43 am

Hi,
interesting article. I want to know if there is a research field that focus on data processing when data in Motion vs in the rest. because analysing IoT data streams in transit is better than batch in Cloud.

Reply

Similar Posts

One Comment

Leave a Reply Cancel reply