Hutility

Reading Excel Files

There are multiple methods in HuLib for reading Excel files. The latest addition is the BufferedExcelReader. BufferedExcelReader is meant to replace older methods as it is much, much more performant. It achieves this by grabbing all Excel data in bulk at the beginning and then providing methods to interact with the reader directly to get data out.

Creating an Instance

To create an instance, just call the constructor with the filename:

javaCopyEditBufferedExcelReader reader = new BufferedExcelReader(fileName);

Now you are ready to read from it! If you want to switch the sheet you are reading you can use:

arduinoCopyEditreader.GoToSheet("MySheetName");

Note that initially, the sheet will be the active workbook sheet.

Reading Data Directly

There are two main ways to do the reading.

1. Using the Reader

This is convenient when column numbers can change and you want to read data row by row and not have to keep track of the position manually:

csharpCopyEditstring[] readLine = reader.ReadStrings();
object[] readData = reader.ReadLine();

ReadStrings() will convert everything to a string while ReadLine() will give you the raw types. Each time you call this the program will jump to the next line so you just have to deal with the data that comes out of it.

2. Use the Sheet Directly

If you want to use direct locations then you can use the sheet directly:

pgsqlCopyEditBufferedExcelReader.Sheet sheet = reader.SelectedSheet;
object read = sheet.Read("B8");
read = sheet.Read("B", 8);
read = sheet.Read(8, 2);

These three methods return the same thing. Note that the third call is using the more common order of arguments where row is first (which looks reversed from the first two methods).

Mapping with BufferedExcelReader

The quickest way to read tables off of Excel is to take advantage of the Mapping framework.

To use the mapping framework we need two things: a model and an object that implements the mappable interface. In this case, a Sheet is a mappable object so we can use it to generate classes for us.

1. The Model

The model is just a class with properties. The only difference is we must define the Mapping attribute on each property to identify the column that will contain the data.

csharpCopyEditpublic class ReceiptLine
{ 
    [Mapping("A")]
    public string Store { get; set; }

    [Mapping("B")]
    public string Invoice { get; set; }

    [Mapping("C")]
    public string Date { get; set; }
}

You can see that the argument for the Mapping attribute is the column that the data will be pulled from.

2. The Mappable Object

Now we want to populate a collection of ReceiptLine from an Excel file. This is trivial:

javaCopyEditBufferedExcelReader reader = new BufferedExcelReader(fileName);
BufferedExcelMap bufferedExcelMap = reader.SelectedSheet.GetMap(9);
List<ReceiptLine> receiptLines = bufferedExcelMap.CreateClasses();

First we create the reader, then we use the selected sheet and call GetMap()—this returns an object that we use for mapping. The argument 9 is the row where the data starts (not the header). In this example, A9 will be the start of the data. Using this BufferedExcelMap object, we just use CreateClasses() to create all the rows of data as we would from other mappable objects.

Another example:

javaCopyEditpublic IEnumerable<ReceiptLine> GetReceiptLines(string fileName)
{
	BufferedExcelReader reader = new BufferedExcelReader(fileName);
	BufferedExcelReader.Sheet sheet = reader.Sheets.First(s => s.Name == "Invoice Summary");
	BufferedExcelMap bufferedExcelMap = sheet.GetMap(7);
	List<ReceiptLine> fleetLines = bufferedExcelMap.CreateClasses();

	return fleetLines;
}

Performance

Comparing this to using the older Excel method that did not preload data on a file that had only about 200 rows:

  • Old method: 49 seconds
  • New method: 2 seconds

Note that now the time is spent in the constructor. The mapping part is very quick.

The difference is entirely due to reducing the number of interop calls into Excel by fetching all data at the start.

Leave A Comment

All fields marked with an asterisk (*) are required