Hi there.
I have an issue with a PdfDocumentProcessor.
I simply want to check if a first page of a PdfDocument contains only one picture an no text.
I then do following…
C#var processor = new PdfDocumentProcessor();
processor.LoadDocument(pdfDocument);
var page = processor.Document.Pages[0];
var areaWholePage = new PdfDocumentArea(pageNumber, page.MediaBox);
var images = processor.GetImages(areaWholePage);
var texts = processor.GetText(areaWholePage);
on
C#var texts = processor.GetText(areaWholePage);
processor fails with a NullReferenceException.
With a little debugging and reflection i can now do the following
C#var dataSelector = (PdfDataSelector)typeof(PdfDocumentProcessor).GetField("dataSelector", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(processor);
var textSelector = (PdfTextSelector)typeof(PdfDataSelector).GetField("textSelector", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(dataSelector);
var selection = textSelector.GetSelection(areaWholePage);
Now returned selection is null and theoretically a sign there is no text on page?
I am not sure if this is the right way to check that.
Do i use PdfDocumentProcessor wrong or it is possibly a bug?
Thank you in advance
Hello Thomas,
The code you are using is correct. I have reproduced the behavior you described in a sample project and passed this ticket to our developers for further research.
Please bear with us. We will notify you as soon as we make any progress.