P&ID digitization. A practitioner's guide
A Dutch HVAC set uses NEN 2778 symbols and area-unit-instrument tag structure. A North Sea platform set uses NORSOK I-005 and KKS. A Texas refinery set uses ISA 5.1 with a house convention the original EPC wrote in 1989. Reading all three as the same grammar produces an output none of the three operators can use. The standard in the drawing dictates the standard in the output.
Read one of your own drawings.
Drop a P&ID, instrument index, or schedule. Tagsight reads it to the tag and opens a workspace you keep when you sign in.
PDF · DWG · DXF · TIFF · PNG · XLSX
What digitization actually means.
Digitizing a P&ID is the process of turning a drawing, vector PDF, scanned PDF, or image into structured data. An instrument list, an equipment list, a line list, a connectivity model, and the documents those drive. It is not the same as redrawing the P&ID in a CAD package. Redrawing produces a new vector file. Digitization produces structured rows you can query, count, export, and connect to a configuration tool.
The distinction matters because the use cases differ. If you need to maintain the drawing in a CAD-of-record system, you redraw or import to AutoCAD P&ID, SmartPlant P&ID, or similar. If you need the data behind the drawing for a controls upgrade, a brownfield revamp, an MOC document, or an instrument index, you digitize. The two activities are sometimes done together, but the documents are independent.
What you are starting with.
Digitization projects fall into three buckets, and the bucket determines the effort.
Vector PDFs from a CAD system are the easy case. Text is selectable, lines are real geometry, symbols come from a library. Tag readability is high, line tracing is reliable, equipment shapes match a known set. A 50-page set in this bucket can be processed in a working day.
CAD-exported PDFs that have been through 'convert text to outlines' are harder. Text is no longer text, it is curves. Tags must be read from the page even though the original drawing was vector. This is more common than people expect, especially with drawings shared between EPCs that did not want to ship live AutoCAD files.
Scanned legacy drawings are the hard case. The drawing was paper, someone ran it through a scanner ten years ago at 200 DPI, and the file has been photocopied and re-scanned since. Skewed pages, stains, fold creases, ink bleed. Tag readability is lower, line geometry is harder to follow, and instrument symbols must tolerate distortion. Throughput is lower, review is more important, and the value to the customer is much higher because nobody else wants to do this work.
What to extract, in order.
A digitization pass produces, in roughly this order. Instruments, tag, signal class, description, page region, equipment, pumps, vessels, tanks, exchangers, with shape and tag, nozzles attached to equipment, lines, line number, size, spec, service, connections between instruments and equipment, off-page connectors, TO DWG-XXX, FROM DWG-YYY, and panels, scope frames where present.
The instrument list and equipment list are the documents most projects pay for. The line list adds value when the project needs piping documents. The connectivity model, which line connects which equipment, which instrument sits on which line, which loop owns which instruments is what enables the high-value downstream tasks. Control loop generation, hazard analysis cross-reference, and cause-and-effect matrices.
A mature digitization workflow extracts all of these in a single pass and lets you choose what to export. Extracting them in separate passes against the same drawing wastes time and creates reconciliation work.
Scanned vs CAD-exported PDFs.
On a CAD-exported vector PDF the text is selectable, the lines are vector paths, and the symbol shapes come from a library. The extraction pipeline can read tag text directly with high reliability, snap bubble centers to vector geometry, and trace lines through the page. Coverage is high and review-flag rate is low.
On a scanned drawing the same input flow does more work. The page is rendered to high-DPI raster, text is read from the scan, shapes are identified, and the results are assembled. Tag readability is lower. Line tracing must reconstruct geometry from raster, which is harder when the scan has speckle or skew. The review-flag rate climbs from a few percent to ten or twenty percent on poor scans.
The right expectation is. Clean scans converge on CAD-exported quality. Bad scans always need more review. Investing in straightening and cleaning the source files before digitization, deskew, denoise, contrast is usually worth it.
Tag conventions you will encounter.
ISA 5.1 is the most common standard in North American process industries. FIT-101, TI-202, PSV-303. Two or three letters that decompose to first letter, measured variable and modifier letters, function. The standard publishes the canonical set, but vendors and operators extend it with letters not in the spec.
KKS is heavy in European power and combined-cycle plants. 10LBA10AT001. Numeric prefixes encode unit, system, sub-system, then a function code. KKS is dense and machine-readable but visually unfamiliar to ISA-trained engineers.
IEC 81346 is the cross-industry equivalent. =A1+B2-K3. 1, with structuring prefixes that segment by aspect, plus, -. DIN 19227 is the older German P&ID convention.
NORSOK uses Norwegian offshore-specific tag conventions. JIS uses Japanese-flavored ISA. GOST appears on Eastern European and Russian plants.
In-house standards are real and constant. A 1987 refinery in Texas might use H instead of P for pressure because the in-house letter set diverged from ISA before the merger. A pharma plant might prefix every tag with the suite number. A digitization tool that cannot read non-ISA conventions will fail half the projects it sees. The right approach is to detect the convention in use, then extract within that convention rather than forcing every tag through an ISA filter.
Instrument, line, and equipment associations.
The output includes instrument-to-line and equipment-to-line associations across the drawing set, so loop, equipment, and line groupings are available in the review interface and in the Excel export.
These associations enable loop detection. A control loop in the process sense is a controller, a measurement, a final control element, and the line that connects them. With instrument and line associations in place you can ask 'show me every loop where the measurement is FT-XXX and the final control element is on line 4-CS-XXX' and get a real answer. Without them you are reading P&IDs page by page.
Cross-page consistency checks follow from the same data. If line 4-CS-101 leaves the bottom of page 3 and enters the top of page 4 with a different size, that is an error worth flagging.
QA that catches what humans miss.
A digitization package should ship with an integrity report. Common checks. Orphan equipment, equipment with no instruments and no connecting lines, unmatched off-page connectors, page 5 says 'TO DWG-7' but DWG-7 has no matching 'FROM DWG-5', tag duplicates across the drawing set, instruments with missing equipment references, lines with missing size or service.
None of these are pure errors. Some are intentional. A spare nozzle on a vessel will look orphan. A line entering an off-page boundary that is genuinely outside the project scope will look unmatched. The job of the integrity report is to surface these for human review, not to throw exceptions.
A second discipline is the sanity-check pass. Class breakdowns, AI, AO, DI, DO ratios, tag-count distribution by area, count of equipment by type. If a process unit shows 80 percent DI it is probably a packaging unit. If it shows 5 percent DI on a refinery train, the limit switches are missing. These ratios are an at-a-glance check that the extraction caught what should be caught.
Delivering the digitized package.
The minimum useful package is. Instrument list, equipment list, line list, integrity report, and the original drawings annotated with the extraction so the customer can see what was read. Most customers also want the data in the format their downstream tooling consumes.
A controls engineer wants the I/O list in TIA Portal XML or Studio 5000 L5X. A piping group wants the line list and equipment list in Excel. A safety group wants the instrument list filtered to safety-rated tags. A document control group wants the integrity report and the version-stamped originals.
Treat the digitized data as a single source and the formats as views over it. Re-extract on revision, regenerate the views, ship the new package. Avoid the trap where the Excel export becomes the source of truth and the drawings drift. The drawings are always the source. The exports are always derived.
Downloads.
FAQ.
Is digitization the same as P&ID redrawing.
No. Redrawing produces a new vector CAD file. Digitization produces structured data, instrument list, equipment list, line list, connectivity from the drawing. The two are sometimes done together but the documents are independent and the workflows are different.
What file formats can be digitized.
Vector PDFs, CAD-exported PDFs with outlined text, scanned PDFs, and raster images, PNG, JPG, TIFF. Vector PDFs are easiest. Scanned drawings work but need cleaner source files for best results.
How long does a typical digitization project take.
A 50-page CAD-exported set is roughly a working day from upload to reviewed output. A 50-page scanned set with mixed quality is two to four days depending on review effort. The bottleneck on scanned drawings is review, not extraction.
What if my drawings use a non-ISA tag convention.
Modern digitization workflows detect the tag convention per drawing and extract within that convention. ISA 5.1, KKS, IEC 81346, NORSOK, JIS, GOST, DIN 19227, and a long tail of in-house conventions all work. The output is in your convention. Nothing gets force-converted to ISA.
Can the connectivity graph be exported.
Yes. DEXPI XML is the most common interchange format for connectivity. JSON is available for custom downstream consumers. Most projects do not need the connectivity export until they reach loop generation or HAZOP cross-reference, but extracting it in the first pass is cheaper than re-running later.
How are off-page connectors handled.
Each off-page connector is captured with its tag, its arrow direction, in or out, and the destination drawing reference if visible. Across the drawing set, connectors are paired, every TO matches a FROM, and unmatched connectors are flagged in the integrity report.
What does the integrity report cover.
Orphan equipment, unmatched off-page connectors, duplicate tags, instruments with missing equipment references, lines with missing size or service, and class-breakdown sanity checks. Each finding is a candidate for review, not necessarily an error.
Do I need to clean up the scans before uploading.
Cleaner scans produce cleaner extraction. Deskew, denoise, and contrast adjustment all improve results. The pipeline can extract from rough scans but the review-flag rate climbs and so does review time. If you have time to run the source files through a deskew tool, it pays back in review hours.
What about title block data.
Drawing number, revision, and project number are extracted from the title block when readable. The values are used for cross-page joining and for stamping the output documents, so revision E of drawing 100 is never confused with revision E of drawing 200.
Can I run digitization on a phone photo.
JPG and PNG inputs are supported. Phone photos work if the photo is a sharp, well-lit, straight-on shot of a complete drawing. Warped, partial, or low-light photos produce more flagged rows than scans do. For one-off field corrections phone photos are useful. For a full digitization pass scans are still the right input.
What stays consistent across revisions.
Tag identity is the spine. If FIT-101 exists in rev A and rev B, it is the same instrument unless the drawing explicitly says otherwise. The diff between revisions is what was added, removed, or modified at the column level. This becomes the basis of the management of change document.