The format and content of items such as product names and specifications in the detailed section of VAT invoices are highly flexible and complex, lacking complete gridlines to separate information fields. Existing methods for all-element structural recognition of VAT invoices face issues like low element recognition rates and high computational complexity. A structured recognition method for full face information based on computer morphology was proposed, which uses morphological operations to detect invoice table lines, cuts and recognizes text in different areas of the invoice. Then the implicit rules of the layout of the value-added tax invoice product details area was reused, combined with the text connected areas obtained through computer morphology operations, to construct a complete table structure. Finally, text detection and recognition were achieved using text detection neural network with differentiable binarization (DBNet) and convolutional recurrent neural networks (CRNN). The proposed method was tested on a dataset of 49 value-added tax invoices in three different formats, and the results show that the element recognition rates reached 99.9%, 97.4%, and 98.8%, respectively. The average running time per invoice is 0.90, 0.47, and 0.82 s, respectively. The structural recognition performance of the entire invoice exceeded multiple comparison table recognition models and literature methods.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |