Wednesday, November 20, 2024

How to Configure a Reverse Proxy on Windows Using NGINX

A reverse proxy acts as an intermediary server between clients and backend servers, forwarding requests and responses. Using NGINX as a reverse proxy enhances performance, strengthens security, simplifies application routing, and facilitates session sharing.

Refer to the following posts to understand more about the reverse proxy - Apache Reverse Proxy: Content From Different Websites | by Albin Issac | The Startup | Medium

Implementing Cross-Domain Cookie Handling for Seamless API Integration | by Albin Issac | Tech Learnings | Medium



Steps to Set Up NGINX as a Reverse Proxy on Windows:

 Install NGINX:

       
start nginx      
 

Configure Reverse Proxy:

  • Open the nginx.conf file in C:\nginx\conf\nginx.conf
Add the reverse proxy configuration: Here is the minimal configuration to proxy the requests to the backend server. You can add multiple servers and locations.

           
#user  nobody;
worker_processes  1;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}



http {
    include       mime.types;
    default_type  application/octet-stream;
	
	log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name '
                                 'to: $upstream_addr: $request '
                                 'status $status upstream_response_time $upstream_response_time '
                                 'request_time $request_time '
                                 'headers: "$http_user_agent, 
                                 $http_x_forwarded_for, $http_x_cip_client_id" '
                                 'upstream_headers: "$upstream_http_content_type, 
                                 $upstream_http_content_length"';

    sendfile        on;

    keepalive_timeout  65;
	
	
	

server {
    listen 8000;
    listen [::]:8000;
	
    #SSL Support
    #listen 8000 ssl;
    #listen [::]:8000 ssl;

    server_name localhost;
	
    #Debug Logging
    #error_log logs/error.log debug;
	
    access_log logs/api_logging.log upstream_logging;
	
    #Configurations to support SSL
	
    #ssl_certificate      C://cert/localhost.crt;
    #ssl_certificate_key  C://cert/localhost.key;

    #ssl_session_cache    shared:SSL:1m;
    #ssl_session_timeout  5m;

    #ssl_ciphers  HIGH:!aNULL:!MD5;
    #ssl_prefer_server_ciphers  on;

    proxy_buffering off;
    proxy_busy_buffers_size   512k;
    proxy_buffers   4 512k;
    proxy_buffer_size   256k;	
	
	location / {
            root   html;
            index  index.html index.htm;
    }


    location /test/api {
		proxy_pass https://api.server.com;   

		proxy_set_header Host $proxy_host;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header Header1  Header1 value;
                proxy_set_header Header2  Header2 value;
		proxy_ssl_name   $proxy_host;    
		proxy_ssl_server_name on; 
		
		#Set the Response Header with backend response headers for debugging
		
		#add_header X-Backend-Status $upstream_status;
		#add_header X-Backend-Server $upstream_addr;
		#add_header X-Backend-Content-Type $upstream_http_content_type;
		#add_header X-Backend-Content-Length $upstream_http_content_length;
		
		proxy_connect_timeout 60s;       # Timeout for connecting to the upstream
		proxy_send_timeout 60s;          # Timeout for sending to the upstream
		proxy_read_timeout 60s;          # Timeout for reading from the upstream


    }
}

   
}


       
 

Save the file and reload NGINX: 

       
 nginx -s reload
       
 

Test the Setup:

Access the API in your browser through localhost e.g http://localhost:8000/test/api/getOrder in your browser. It should forward requests to your backend server and return the response.

Stop NGINX server
 
       
 nginx -s stop      
 
Additionally, you can 

  • Enable HTTPS: Secure the proxy with SSL certificates.
  • Load Balancing: Add multiple backend servers:

  • Using NGINX as a reverse proxy on Windows is a powerful way to manage and secure web traffic.
    With a few simple steps, you can enable features like load balancing, SSL termination, and caching, making your application more robust and scalable.

    Thursday, November 14, 2024

    Securing PDFs Generated with Java: A Guide to Enhanced Protection

    As businesses increasingly rely on digital documentation, protecting sensitive information within PDFs has become a top priority. PDFs are widely used because they preserve document formatting across devices, but without adequate security measures, they can be vulnerable to unauthorized access, copying, or editing. In Java applications that generate PDFs, adding protection layers can ensure that confidential information stays secure.

    In this guide, we’ll explore how to enhance the security of PDFs generated using Java. From implementing password protection to setting user permissions and encryption, we’ll cover various approaches to safeguard your documents.

    Why Protect PDFs?

    Sensitive information—like financial records, proprietary research, and personal data—often ends up in PDF format. Securing these files not only helps prevent data breaches but also supports compliance with data privacy regulations like GDPR and HIPAA. Implementing PDF protection in your Java application can ensure that only authorized users have access and prevent alterations or copying of sensitive content.

    Techniques for Securing PDFs in Java

    Password Protection

    Password-protecting PDFs provides an effective security layer to limit access. Java libraries like iText and Apache PDFBox offer methods to apply two types of passwords:

    • User Password: Required to open the document, ensuring only authorized users can view its contents.
    • Owner Password: Grants full control over document permissions (such as printing or editing) and allows modification of the document’s security settings.

    Encryption:

    Encryption secures PDF content by converting it into unreadable code, making it accessible only to users with the correct credentials. PDF encryption can use different approaches depending on security requirements:

    Password-Based Encryption (Symmetric)


    This is the most common form of PDF encryption, where a single password or key is used to both encrypt and decrypt the document. Password-based encryption ensures that only users who know the user password can open and view the document. An owner password can also be set to control permissions, such as printing, copying, or editing.

    • Supported Encryption Algorithms: Libraries like PDFBox and iText support AES-128 and AES-256 encryption, providing strong security for PDF content.
    • Use Case: Ideal for general document security when you need a straightforward way to limit access.

    Setting Permissions:

    Permissions offer granular control over document actions, such as printing, copying, editing, and content extraction. The owner password allows you to set these permissions in the PDF. This is especially useful for documents with sensitive content where you want to prevent unauthorized actions. Libraries like iText and PDFBox enable developers to configure these permissions during PDF creation.


    Digital Signatures:

    Digital signatures authenticate a document’s origin and confirm that it hasn’t been altered since signing, making them essential for legal, financial, and compliance-focused documents. Created using a private key embedded in a certificate, digital signatures provide document authenticity and tamper-evidence.

    • Self-Signed Certificates: Useful for internal testing and validation, as they verify document integrity without requiring external validation. You can generate a self-signed certificate for demonstration purposes with the following keytool command:

      keytool -genkeypair -alias testkey -keyalg RSA -keysize 2048 -validity 365 -keystore keystore.p12 -storetype PKCS12 -storepass changeit -keypass changeit -dname "CN=Test User, OU=Development, O=YourCompany, L=City, S=State, C=US"
    • CA-Signed Certificates: Issued by trusted Certificate Authorities (CAs), CA-signed certificates are recommended for production environments as they provide credibility for official, external-facing documents. These certificates ensure maximum authenticity and compatibility, particularly in regulated industries.

    When obtaining a document signing certificate from a CA, there are typically two private key storage options:

    1. Hardware Tokens: Many CAs, including DigiCert, GlobalSign, and Entrust, require document signing certificates to be stored on a hardware token (usually a USB device). This token-based storage provides high security by storing the private key in a tamper-resistant environment. However, it has limitations for cloud applications and may not support fully automated workflows.

    2. Cloud HSM (Hardware Security Module): Some CAs support storing document signing keys in a Cloud HSM. Cloud HSMs are FIPS-compliant, secure environments that allow the private key to remain secure while still being accessible to cloud-hosted applications. Examples of cloud HSM services include:

      • AWS CloudHSM
      • Azure Dedicated HSM
      • Google Cloud HSM

      Cloud HSMs offer several advantages for cloud-based and automated workflows:

      • Secure Remote Access: Allows cloud applications to access the signing key securely.
      • Programmatic Access: Enables automated, high-volume signing directly from applications.
      • Compliance and Security: Meets FIPS standards for handling sensitive data.

    For production use, a CA-signed certificate is preferred to ensure that digital signatures are universally trusted and recognized. Be aware that some CAs may impose restrictions on the number of documents that can be signed per year or month, so choose an option that meets your specific needs.

    Watermarking: 

    Watermarking places visible text or images (e.g., "Confidential" or "Draft") across each page to visually indicate the document’s purpose or confidentiality level. Watermarks do not restrict access but serve as a deterrent to unauthorized distribution, making them particularly useful for internal or sensitive documents.

    iText:

    Sample code using the iText library to enable digital signatures. This example uses a self-signed certificate, but in a real scenario, a CA-signed certificate should be used as discussed earlier. Additionally, the code demonstrates how to control permissions, enable a user password, and add a watermark.


    import com.itextpdf.kernel.pdf.*;
    import com.itextpdf.kernel.geom.Rectangle;
    import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
    import com.itextpdf.kernel.pdf.xobject.PdfImageXObject;
    import com.itextpdf.io.image.ImageDataFactory;
    import com.itextpdf.layout.Document;
    import com.itextpdf.layout.element.Paragraph;

    import com.itextpdf.signatures.*;

    import org.bouncycastle.jce.provider.BouncyCastleProvider;

    import java.io.ByteArrayInputStream;
    import java.io.ByteArrayOutputStream;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.security.KeyStore;
    import java.security.PrivateKey;
    import java.security.Security;
    import java.security.cert.Certificate;

    public class SignedPDFExample {

        private static final String KEYSTORE_PATH = "keystore.p12";
        // Path to your keystore
        private static final char[] KEYSTORE_PASSWORD = "changeit".toCharArray();
        private static final String KEY_ALIAS = "testkey";
        // Alias of the key
        private static final String WATERMARK_IMAGE_PATH = "watermark.png";
        // Path to the watermark image

        public static void main(String[] args) {
            try {
                Security.addProvider(new BouncyCastleProvider());

                // Generate, watermark, and sign the PDF
                ByteArrayOutputStream signedOutputStream =
                        generateWatermarkedAndSignedPdf();
                try (FileOutputStream fos =
                    new FileOutputStream("signed_watermarked_test.pdf")) {
                    fos.write(signedOutputStream.toByteArray());
                }
                System.out.println("Signed and watermarked PDF created:
                    signed_watermarked_test.pdf");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        public static ByteArrayOutputStream generateWatermarkedAndSignedPdf()
            throws Exception {
            // Load the keystore
            KeyStore ks = KeyStore.getInstance("PKCS12");
            ks.load(new FileInputStream(KEYSTORE_PATH), KEYSTORE_PASSWORD);
            PrivateKey privateKey = (PrivateKey) ks.getKey(KEY_ALIAS, KEYSTORE_PASSWORD);
            Certificate[] chain = ks.getCertificateChain(KEY_ALIAS);

            // Define permissions and set owner password (no user password required)
            String ownerPassword = "ownerpassword";
    String userPassword = "userpassword";
            WriterProperties writerProperties = new WriterProperties()
                    .setStandardEncryption(
                            userPassword.getBytes(), // user password is set
                            ownerPassword.getBytes(), // owner password
                            EncryptionConstants.ALLOW_PRINTING |
                            EncryptionConstants.ALLOW_COPY,
                            // Adjust permissions as needed
                            EncryptionConstants.ENCRYPTION_AES_256);

            // Generate PDF in memory with encryption properties
            ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
            PdfWriter writer = new PdfWriter(outputStream, writerProperties);
            PdfDocument pdfDoc = new PdfDocument(writer);
            Document document = new Document(pdfDoc);
            document.add(
                    new Paragraph("This is a test PDF with a digital signature,
                            watermark, and restricted permissions."));

            // Apply watermark to each page
            PdfImageXObject watermarkImage =
                new PdfImageXObject(ImageDataFactory.create(WATERMARK_IMAGE_PATH));
            // Loop through all pages to add watermark
            int numberOfPages = pdfDoc.getNumberOfPages();
            for (int i = 1; i <= numberOfPages; i++) {
                PdfCanvas canvas = new PdfCanvas(pdfDoc.getPage(i));
                Rectangle pageSize = pdfDoc.getPage(i).getPageSize();
                // Updated to Rectangle type
                float x = (pageSize.getWidth() - watermarkImage.getWidth()) / 2;
                float y = (pageSize.getHeight() - watermarkImage.getHeight()) / 2;

                canvas.saveState();
                canvas.addXObject(watermarkImage, x, y);
                canvas.restoreState();
            }

            document.close();

            // Sign the PDF in memory
            ByteArrayOutputStream signedOutputStream = new ByteArrayOutputStream();

            // Initialize PdfReader with the owner password to allow modifications for
            // signing
            PdfReader reader = new PdfReader(new ByteArrayInputStream
                                            (outputStream.toByteArray()),
                    new ReaderProperties().setPassword(ownerPassword.getBytes()));
            PdfSigner signer = new PdfSigner(reader, signedOutputStream,
                                new StampingProperties());

            // Configure signature appearance
            PdfSignatureAppearance appearance = signer.getSignatureAppearance();
            appearance.setReason("Testing Signature").setLocation("USA");

            // Cryptographic parameters
            IExternalDigest digest = new BouncyCastleDigest();
            IExternalSignature signature = new PrivateKeySignature(privateKey, "SHA-256",
                    BouncyCastleProvider.PROVIDER_NAME);

            // Sign and return signed PDF stream with editing restrictions
            signer.signDetached(digest, signature, chain, null, null, null, 0,
                                PdfSigner.CryptoStandard.CMS);
            return signedOutputStream;
        }
    }



    Add iText Maven Repo to the Pom.xml

        <repositories>
          <repository>
            <id>itext-releases</id>
            <name>iText Repository - releases</name>
            <url>https://repo.itextsupport.com/releases</url>
          </repository>
        </repositories>

    Add below Dependencies

        <dependency>        
             <groupId>com.itextpdf</groupId>        
             <artifactId>kernel</artifactId>        
             <version>7.1.5</version>    
        </dependency>

        <dependency>        
             <groupId>com.itextpdf</groupId>        
             <artifactId>sign</artifactId>        
             <version>7.1.5</version>    
        </dependency>      
       
        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>sign</artifactId>
            <version>7.1.5</version>
        </dependency>      
        <dependency>
            <groupId>org.bouncycastle</groupId>
            <artifactId>bcpkix-jdk15on</artifactId>
            <version>1.68</version>
        </dependency>

    Please note that iText requires a license for commercial use. Alternatively, PDFBox is licensed under the Apache License and is free to use in all cases.

    PDFBox:

    Sample code using the PDFBox library to enable digital signatures. This example uses a self-signed certificate, but in a real scenario, a CA-signed certificate should be used as discussed earlier. Additionally, the code demonstrates how to control permissions, enable a user password, and add a watermark.

    import org.apache.pdfbox.Loader;
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDPage;
    import org.apache.pdfbox.pdmodel.PDPageContentStream;
    import org.apache.pdfbox.pdmodel.PDPageTree;
    import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
    import org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy;
    import org.apache.pdfbox.pdmodel.font.PDType1Font;
    import org.apache.pdfbox.pdmodel.font.Standard14Fonts;
    import org.apache.pdfbox.pdmodel.common.PDRectangle;
    import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
    import org.apache.pdfbox.pdmodel.interactive.digitalsignature.PDSignature;
    import org.apache.pdfbox.pdmodel.interactive.digitalsignature.SignatureInterface;
    import org.bouncycastle.cert.jcajce.JcaCertStore;
    import org.bouncycastle.cert.jcajce.JcaX509CertificateHolder;
    import org.bouncycastle.cms.CMSSignedDataStreamGenerator;
    import org.bouncycastle.operator.ContentSigner;
    import org.bouncycastle.operator.jcajce.JcaContentSignerBuilder;
    import org.bouncycastle.operator.jcajce.JcaDigestCalculatorProviderBuilder;

    import java.io.ByteArrayInputStream;
    import java.io.ByteArrayOutputStream;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.io.InputStream;
    import java.io.OutputStream;
    import java.security.KeyStore;
    import java.security.PrivateKey;
    import java.security.Security;
    import java.security.cert.Certificate;
    import java.security.cert.X509Certificate;
    import java.util.ArrayList;
    import java.util.Calendar;
    import java.util.List;

    public class PDFBoxSignedWatermarkedExample {

        private static final String KEYSTORE_PATH = "keystore.p12";
        private static final char[] KEYSTORE_PASSWORD = "changeit".toCharArray();
        private static final String KEY_ALIAS = "testkey";
        private static final String WATERMARK_IMAGE_PATH = "watermark.png";

        public static void main(String[] args) {
            try {
                Security.addProvider(
                        new org.bouncycastle.jce.provider.BouncyCastleProvider());

                // Generate, watermark, and sign the PDF
                ByteArrayOutputStream signedOutputStream =
                        generateWatermarkedAndSignedPdf();
                try (FileOutputStream fos =
                    new FileOutputStream("signed_watermarked_test.pdf")) {
                    fos.write(signedOutputStream.toByteArray());
                }
                System.out.println("Signed and watermarked PDF created:
                     signed_watermarked_test.pdf");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        public static ByteArrayOutputStream generateWatermarkedAndSignedPdf()
                throws Exception {
            KeyStore ks = KeyStore.getInstance("PKCS12");
            ks.load(new FileInputStream(KEYSTORE_PATH), KEYSTORE_PASSWORD);
            PrivateKey privateKey = (PrivateKey) ks.getKey(KEY_ALIAS, KEYSTORE_PASSWORD);
            Certificate[] certificateChain = ks.getCertificateChain(KEY_ALIAS);

            ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
            PDDocument document = new PDDocument();

            // Add a page with sample content
            PDPage page = new PDPage(PDRectangle.A4);
            document.addPage(page);

            try (PDPageContentStream contentStream =
                    new PDPageContentStream(document, page)) {
                contentStream.beginText();
                contentStream.setFont(
                            new PDType1Font(Standard14Fonts.FontName.HELVETICA_BOLD), 12);
                contentStream.newLineAtOffset(100, 700);
                contentStream
                        .showText("This is a test PDF with a digital signature,
                                        watermark, and restricted permissions.");
                contentStream.endText();
            }

            // Apply watermark to each page
            PDImageXObject watermarkImage =
                    PDImageXObject.createFromFile(WATERMARK_IMAGE_PATH, document);
            PDPageTree pages = document.getPages();
            for (PDPage pdfPage : pages) {
                PDRectangle pageSize = pdfPage.getMediaBox();
                try (PDPageContentStream contentStream =
                        new PDPageContentStream(document, pdfPage,
                        PDPageContentStream.AppendMode.APPEND, true, true)) {
                    float x = (pageSize.getWidth() - watermarkImage.getWidth()) / 2;
                    float y = (pageSize.getHeight() - watermarkImage.getHeight()) / 2;
                    contentStream.drawImage(watermarkImage, x, y);
                }
            }

            // Set permissions with owner password and user password
            AccessPermission accessPermission = new AccessPermission();
            accessPermission.setCanPrint(true);
            accessPermission.setCanModify(false);
            accessPermission.setCanExtractContent(true);

            StandardProtectionPolicy protectionPolicy =
                        new StandardProtectionPolicy("ownerpassword", "userpassword",
                    accessPermission);
            protectionPolicy.setEncryptionKeyLength(256);
            document.protect(protectionPolicy);

            // Save the document with encryption applied to an output stream
            document.save(outputStream);
            document.close();

            // Sign the document
            return signPdf(new ByteArrayInputStream(outputStream.toByteArray()),
                            privateKey, certificateChain);
        }

        public static ByteArrayOutputStream signPdf(InputStream pdfInputStream,
                            PrivateKey privateKey, Certificate[] chain)
                throws Exception {
            // Read all bytes from InputStream for PDFBox compatibility
            ByteArrayOutputStream buffer = new ByteArrayOutputStream();
            int bytesRead;
            byte[] data = new byte[1024];
            while ((bytesRead = pdfInputStream.read(data, 0, data.length)) != -1) {
                buffer.write(data, 0, bytesRead);
            }
            buffer.flush();
            byte[] pdfBytes = buffer.toByteArray();

            // Load the PDF document from byte array with the user password
            PDDocument document = Loader.loadPDF(pdfBytes, "userpassword");

            PDSignature signature = new PDSignature();
            signature.setFilter(PDSignature.FILTER_ADOBE_PPKLITE);
            signature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED);
            signature.setName("Test User");
            signature.setLocation("USA");
            signature.setReason("Testing Digital Signature");
            signature.setSignDate(Calendar.getInstance());

            document.addSignature(signature, new SignatureInterface() {
                @Override
                public byte[] sign(InputStream content) {
                    try {
                        return signContent(content, privateKey, chain);
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                    return null;
                }
            });

            ByteArrayOutputStream signedOutput = new ByteArrayOutputStream();
            document.saveIncremental(signedOutput);
            document.close();
            return signedOutput;
        }

        public static byte[] signContent(InputStream content, PrivateKey privateKey,
                 Certificate[] chain) throws Exception {
            List<X509Certificate> certList = new ArrayList<>();
            for (Certificate cert : chain) {
                certList.add((X509Certificate) cert);
            }

            CMSSignedDataStreamGenerator cmsStreamGenerator =
                        new CMSSignedDataStreamGenerator();
            cmsStreamGenerator.addCertificates(new JcaCertStore(certList));

            ContentSigner contentSigner = new JcaContentSignerBuilder("SHA256withRSA")
                                            .setProvider("BC").build(privateKey);
            JcaX509CertificateHolder certificateHolder =
                    new JcaX509CertificateHolder((X509Certificate) chain[0]);

            cmsStreamGenerator.addSignerInfoGenerator(
                    new org.bouncycastle.cms.SignerInfoGeneratorBuilder(
                            new JcaDigestCalculatorProviderBuilder()
                            .setProvider("BC").build())
                            .build(contentSigner, certificateHolder));

            ByteArrayOutputStream signatureOutput = new ByteArrayOutputStream();
            try (OutputStream out = cmsStreamGenerator.open(signatureOutput, true)) {
                byte[] buffer = new byte[8192];
                int bytesRead;
                while ((bytesRead = content.read(buffer)) != -1) {
                    out.write(buffer, 0, bytesRead);
                }
            }

            return signatureOutput.toByteArray();
        }
    }



    Dependencies:

    <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>pdfbox</artifactId>
          <version>3.0.3</version>
     </dependency>
         
    <dependency>
          <groupId>org.bouncycastle</groupId>
          <artifactId>bcpkix-jdk15on</artifactId>
          <version>1.68</version>
    </dependency>

    Conclusion:

    Protecting PDFs generated in Java applications can provide an essential layer of security for sensitive information. By leveraging tools like iText, PDFBox you can ensure your documents are password-protected, encrypted, and permission-controlled. Implementing these security practices strengthens your application’s data privacy and compliance posture, providing peace of mind for both developers and end users.

    Thursday, October 31, 2024

    How to Resolve org.apache.poi.util.RecordFormatException: "Tried to Allocate an Array of Length, but the Maximum Length is 100,000,000

    When uploading and processing a large .xls file through Java in an AEM backend, I encountered the following exception:

    Exception in thread "main" org.apache.poi.util.RecordFormatException:
    Tried to allocate an
    array of length 159,135,035, but the maximum length for this record type is 100,000,000.
    If the file is not corrupt and not large, please open an issue on bugzilla to
    request increasing the maximum allowable size for this record type. You can set a higher override value with IOUtils.setByteArrayMaxOverride()

    Stack Trace:

    at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:596)
    at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:281) ... at ExcelReader.readSpreadsheet(ExcelReader.java:17) at ExcelReader.main(ExcelReader.java:52)



    Explanation of the Issue

    This exception occurs because Apache POI tries to allocate a large byte array to process the Excel file, exceeding the default size limit of 100,000,000 bytes. The issue is often due to the size or structure of the .xls file, which requires a larger array than Apache POI allows by default.

    Solutions

    There are two primary ways to resolve this issue:


    Option 1: Increase the Byte Array Limit

    You can override the maximum allowable byte array size using IOUtils.setByteArrayMaxOverride(). Setting this to a higher value, such as 200,000,000, may help process larger Excel files. However, this approach can impact memory usage and may lead to performance issues with very large files. Before increasing the size, make sure the issue isn’t due to a corrupted or incorrect file format.

    Updated Code Snippet:


    import org.apache.poi.ss.usermodel.*; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.util.IOUtils; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReader { public void readSpreadsheet(String filePath) { Workbook workBook = null; try (FileInputStream fis = new FileInputStream(new File(filePath))) { // Increase the byte array limit for larger files IOUtils.setByteArrayMaxOverride(200_000_000); // Create Workbook instance workBook = WorkbookFactory.create(fis); Sheet sheet = workBook.getSheetAt(0); Iterator<Row> rowIter = sheet.rowIterator(); boolean firstRow = true; while (rowIter.hasNext()) { Row row = rowIter.next(); if (!firstRow) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(cell + "\t"); // Print cell content } System.out.println(); // Newline for each row } else { firstRow = false; } } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (workBook != null) { try { workBook.close(); } catch (IOException e) { System.err.println("Unable to close Workbook object: "
                         + e.getMessage()); } } } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReader reader = new ExcelReader(); reader.readSpreadsheet(filePath); } }

    Command to Compile and Run:

    javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader.java

    java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader

    Option 2: Use the Streaming API (SXSSFWorkbook)

    Using SXSSFWorkbook, a streaming API in Apache POI, allows you to process large .xlsx files in a memory-efficient way. This approach loads only a small part of the file into memory at any time, making it suitable for processing large Excel files.

    Code Using SXSSFWorkbook:


    import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReaderStream { public void readSpreadsheet(String filePath) { SXSSFWorkbook streamingWorkbook = null; try (FileInputStream fis = new FileInputStream(new File(filePath)); XSSFWorkbook workbook = new XSSFWorkbook(fis)) { // Wrap XSSFWorkbook in SXSSFWorkbook to enable streaming streamingWorkbook = new SXSSFWorkbook(workbook); Sheet sheet = streamingWorkbook.getSheetAt(0); // Iterate through rows in a memory-efficient way for (Row row : sheet) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(getCellValue(cell) + "\t"); } System.out.println(); // Newline for each row } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (streamingWorkbook != null) { try { streamingWorkbook.dispose(); // Dispose of temporary files streamingWorkbook.close(); } catch (IOException e) { System.err.println("Unable to close SXSSFWorkbook object: "
                        + e.getMessage()); } } } } // Utility method to get the value of a cell as a String private String getCellValue(Cell cell) { switch (cell.getCellType()) { case STRING: return cell.getStringCellValue(); case NUMERIC: if (DateUtil.isCellDateFormatted(cell)) { return cell.getDateCellValue().toString(); } else { return String.valueOf(cell.getNumericCellValue()); } case BOOLEAN: return String.valueOf(cell.getBooleanCellValue()); case FORMULA: return cell.getCellFormula(); default: return ""; } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReaderStream reader = new ExcelReaderStream(); reader.readSpreadsheet(filePath); } }

    Command to Compile and Run:

    javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream.java

    java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream

    Conclusion

    • Option 1: Increasing the byte array limit using IOUtils.setByteArrayMaxOverride() can solve the immediate issue but may impact memory usage.
    • Option 2: Using SXSSFWorkbook with the streaming API is a more scalable solution for large .xlsx files, allowing memory-efficient processing.

    Choose the approach that best suits your file size and memory requirements. Let me know if you have any questions!