Ingesting RFID Data into Big Data Databases

Using Sqoop to read RFID data into Hadoop Distributed File System for Big Data Analysis

RFID tags are commissioned to contain key information that uniquely identifies the asset they are attached to. In turn RFID readers are configured to collect the data from the RFID tags and storing that data in an application database. The enterprise computers run RFID middleware that filter the data before storing it in the database.

Unlike bar codes, RFID tags are read constantly. Typically, the RFID application does not want to store all of the actual tag reads for an asset being scanned and the RFID middleware filters the data and passes on only certain key events. RFID data can also be formatted to flow into an existing enterprise resource planning (ERP) application database. The intent of the application is to collect the RFID data as information for business intelligence and analysis.

More and more these ever increasing amounts of data—from RFID, sensors, and other high volume inputs—are being passed to big data systems containing Hadoop technologies. The Hadoop Distributed File System (HDFS) is the application-level data store that stores the volumes of big data that are subsequently processed by the Hadoop ecosystem (e.g., MapReduce, Spark, Pig, Hive) for processing and to NoSQL databases for storage.

Since the RFID data is traditionally collected in a structured, tabular format (e.g., relational databases, spreadsheets), the data needs to be ingested into the big data distributed file system.

Sqoop (http://sqoop.apache.org) is a leading technology that is currently used to read structured data and write it into the Hadoop HDFS. Sqoop (“SQL—Hadoop”) is a tool designed to transfer data between relational database servers and Hadoop. It is used to import data from relational databases such as MySQL and Oracle to Hadoop HDFS, and export from the Hadoop file system to these relational databases.

RFID data is usually stored in table rows after being read. Typically, the database will contain the unique RFID serial number associated with that item that is used as the table index, the asset’s name or model number, information about that product, and fields related to when and where the associated tag was interrogated. The unique RFID serial number is called the Electronic Product Code (EPC) ID and for most implementations it is 96 bits in length and should be unique for a given deployment. (There is a standard that defines how the fields should be encoded to guarantee this uniqueness).

Although Apache Hadoop was originally created to process huge amounts of unstructured data, it is now also process the structured data found in relational databases. Although it is possible to configure and manually execute data migration, a tool such as Sqoop automates and simplifies getting data from a relational source to HDFS, and vice versa.

Sqoop uses the import command to import data. It also supports the use of custom SQL queries within Sqoop to pick and choose which data to import. Sqoop can also be used to import data into other Hadoop data warehouses like Apache Hive.

So if your application needs to send RFID data into a big data Hadoop implementation, Sqoop is the way to go, especially if you are currently storing your RFID data in a relational database. Once your RFID data is in HDFS it is ready for your MapReduce jobs to analyze the data, spot correlations, and produce analytics that can be used to further understand the data. Big data analytics now allows users to look at all of the many types (volume, variety, velocity, and veracity) of data that are being created and stored in your enterprise data stores.

For more information on RFID including readers, printers, labels & tags, and accessories please contact us at the Gateway RFID store (https://gatewayrfidstore.com/).