Drupal and Big Data sets. 100 million Drupal Entities using Solr. How the Department Of Energy predicts Biofuel availability.

mr_scumbag

The Department of Energy (DOE) does a study of available crops that can be used for Biofuel production every 5 or six years. This includes historical data as well as a 'predict the future' part as well. The data is split into county level over multiple years and multiple crops at multiple price points.
This is called the Billion Ton Study. The latest one is about to be released in 2016. It's 100 million+ rows of data.
The Drupal 7 site is https://www.bioenergykdf.net and the "Billion Ton 16" data is currently under development for distribution on this site. It should be live by DrupalCon 2016.

The site is Managed to Oak Ridge National Labs (of Manhattan Project fame - plus many other technologies) and built by Drupal Developers at Code Journeymen LLC.
I will have the co presenter from ORNL will me at DrupalCon to do this presentation. Client and Agency present together. https://www.drupal.org/u/atrentm
(Aaron has been using Drupal for a long time - can't believe he didn't have a D.O account.....)


This gives the DOE an ability to predict what amount of biofuel would be available, and allows the DOE to work towards a energy independant United States.

Each Data row is a Drupal entity, and is indexed in Solr. This allows us to search into the vast amounts of data, pull out the relevant million or so rows, aggregate them,and then do Drupally things with the results, in around 30 seconds of processing.

The resulting data about the crops is then mapped for visualization.

 

Previous Billion Ton updates could only manage about 100 thousand rows of data (in over 3 minutes) before breaking, and the data set was only in the 30 million rows range.

 

The attendees need no special Drupal or development knowledge. This is more of a case study in large data sets and Drupal using Solr (search_api module)

The attendees will learn that Drupal and Solr and handle VERY large complex data sets, that are searchable, and be able to do it quickly.

 

I have not given this presentation yet, as the ORNL legal team are still OK'ing it. We predict no problems with this, it's just a formallity.

I will be presenting this at our DUG before DrupalCon, and anywhere else they will let me. Probably the ChaDev group too http://www.meetup.com/chadevs

 

Session Track

Business

Experience Level

Beginner

Drupal Version