Bug Report T298289
Visible to All Users

Text Extraction - NullReferenceException is raised on an attempt to execute the PdfDocumentProcessor.GetText method for a document area that does not contain text

created 9 years ago (modified 9 years ago)

Hi there.

I have an issue with a PdfDocumentProcessor.

I simply want to check if a first page of a PdfDocument contains only one picture an no text.
I then do following…

C#
var processor = new PdfDocumentProcessor(); processor.LoadDocument(pdfDocument); var page = processor.Document.Pages[0]; var areaWholePage = new PdfDocumentArea(pageNumber, page.MediaBox); var images = processor.GetImages(areaWholePage); var texts = processor.GetText(areaWholePage);

on

C#
var texts = processor.GetText(areaWholePage);

processor fails with a NullReferenceException.

With a little debugging and reflection i can now do the following

C#
var dataSelector = (PdfDataSelector)typeof(PdfDocumentProcessor).GetField("dataSelector", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(processor); var textSelector = (PdfTextSelector)typeof(PdfDataSelector).GetField("textSelector", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(dataSelector); var selection = textSelector.GetSelection(areaWholePage);

Now returned selection is null and theoretically a sign there is no text on page?
I am not sure if this is the right way to check that.

Do i use PdfDocumentProcessor wrong or it is possibly a bug?

Thank you in advance

Comments (1)
DevExpress Support Team 9 years ago

    Hello Thomas,
    The code you are using is correct. I have reproduced the behavior you described in a sample project and passed this ticket to our developers for further research.
    Please bear with us. We will notify you as soon as we make any progress.

    Answers approved by DevExpress Support

    created 9 years ago (modified 9 years ago)

    We have fixed the issue described in this ticket and will include the fix in our next maintenance update. To apply this solution before the official update, request a hotfix by clicking the corresponding link for product versions you require.

    Note: Hotfixes may be unavailable for beta versions and updates that are about to be released.

      Disclaimer: The information provided on DevExpress.com and affiliated web properties (including the DevExpress Support Center) is provided "as is" without warranty of any kind. Developer Express Inc disclaims all warranties, either express or implied, including the warranties of merchantability and fitness for a particular purpose. Please refer to the DevExpress.com Website Terms of Use for more information in this regard.

      Confidential Information: Developer Express Inc does not wish to receive, will not act to procure, nor will it solicit, confidential or proprietary materials and information from you through the DevExpress Support Center or its web properties. Any and all materials or information divulged during chats, email communications, online discussions, Support Center tickets, or made available to Developer Express Inc in any manner will be deemed NOT to be confidential by Developer Express Inc. Please refer to the DevExpress.com Website Terms of Use for more information in this regard.