How To Split PDFs With Dynamic Ranges In UiPath

How To Split PDFs With Dynamic Ranges In UiPath

8.874 Lượt nghe
How To Split PDFs With Dynamic Ranges In UiPath
This video tutorial shows, how to split a PDF into multiple PDFs with UiPath. In the second part the guide shows, how to splidt PDFs with dynamic ranges (page numbers can be dynamic). The use case also involves a lot of UiPath work with files and folders. You could also watch: 🔵 Extract tables out of PDFs in UiPath - https://youtu.be/WPKEJLW7_Js 🔵 Invoice PDF Exctraction with Regex in UiPath - https://youtu.be/uCdBC2pXPyY 0:00 Intro to the Use Case We want to split a PDF file into multiples PDF files. Imagine a PDF with multiple invoices in it. The challenge is, that we don't know the span of the pages to be splitted, they are dynamic (the invoice can be of 1, 2 or more pages). 📁 Download the files from the video: https://1drv.ms/u/s!Al9RjoWZcShJi548urnef_W4HY30Nw?e=zRyG9w 1:27 Install the PDF Package in UiPath We install the UiPath.PDF.Activities by UiPath in order to be able to use the PDF activities. 1:49 Split a PDF into multiple one page PDFs In the first part we split our PDF into multiple one page PDFs. This solution will only make one page PDFs, so we will have a problem if our splitted PDFs spans more than one page. 1:59 Get all filenames in a folder We use a For Each and the .NET method Directory.GetFiles to get all files in a folder as strings, so we can work with them. Buy this book to learn all about VB.NET (the coding language in UiPath): https://geni.us/v6ffI (AFFILIATE). As a best practice do remember to create a variable for the folder in scope and not hardcode it in the activity. We will also define the searchPattern, so we will only look for certain file types. 4:30 Get PDF Page Count We use the activity to get the total page count of our PDF. The count is stored as an integer. 5:44 Extract the PDF pages one by one Using two page counters (one for the current page and one for the total page count) and a While loop, we can iterate through the entire PDF. Use a Extract PDF Page Range activity and remember to add one to your current page counter. We use another .NET method to get the file names without extensions, path.GetFileNameWithoutExtensions. A good idea is to add a unique ID to the extracted file name. 10:57 Create folder if it doesn't exist We create the Output folder, if it doesn't exist, using the activities Path Exists, If and Create Folder. Here you will learn to work with folders and booleans. Add this to always check, if the Output folder exist and create it, if it doesn't. 12:58 Split the PDF in multiple one or two page PDFs Now we expand our solution to also cover, if the PDFs is spanning either one or two page. We now read each of the PDF pages into a string, so we can apply Regex on it and look if a "Page 2" exist. If yes, we know it's a two page PDFs. The solution is now to extract two pages (the current one and the previous) and then overwrite the previous page. 18:16 Split PDF with dynamic ranges Now our PDFs to be extracted can be of any length and we therefore need to solve. The intuition is, that we want to check if got 2 pages, if yes then check for 3 pages, if yes then check for 4 pages and so forth...That is another While loop and a Regex Matches activity. Connect with me: 🔔 Subscribe - http://www.youtube.com/user/klogeanders?sub_confirmation=1 💼 LinkedIn - https://www.linkedin.com/in/andersjensens/ 👥 Facebook - https://www.facebook.com/andersjensenorg 💌 Email Newsletter - https://andersjensen.org/email-newsletter/ #uipath #rpa #automation