Accelerating Legislation Processes through Semantic Similarity Analysis with BERT-based Deep Learning

Document Type : Original Article

Authors

1 Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran

2 Faculty of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran

Abstract

Countries are managed based on accurate and precise laws. Enacting appropriate and timely laws can cause national progress. Each law is a textual term that is added to the set of existing laws after passing a process with the approval of the assembly. In the review of each new law, the relevant laws are extracted and analyzed among the set of existing laws. This paper presents a new solution for extracting the relevant rules for a term from an existing set of rules using semantic similarity and deep learning techniques based on the BERT model. The proposed method encodes sentences or paragraphs of text in a fixed-length vector (dense vector space). Thereafter, the vectors are utilized to evaluate and score the semantic similarity of the sentences with the cosine distance measurement scale. In the proposed method, the machine can understand the meaning and concept of the sentences by using the BERT model coding method. The BERT model considers the position of the entities in the sentences. Then the semantic similarities of documents, calculating the degree of similarity between their documents with a subject, and detecting their semantic similarity are done. The results obtained from the test dataset indicated the precision and accuracy of the method in detecting semantic similarities of legal documents related to the Islamic Consultative Assembly of Iran, as well as the precision and accuracy of performance above 90%.

Graphical Abstract

Accelerating Legislation Processes through Semantic Similarity Analysis with BERT-based Deep Learning

Keywords

Main Subjects


  1. National strategic plan for research and development of artificial intelligence and legislation in Iran. In: Center MR, editor.: Islamic Parliament Research Center of The Islamic Republic of IRAN; 2018.
  2. Research in artificial intelligence and legislation and review of civil law in the field of robotics of the European Union Parliament. In: Center MR, editor.: Islamic Parliament Research Center of The Islamic Republic of Iran; 2019.
  3. Burri T, Von Bothmer F. The new EU legislation on artificial intelligence: a primer. Available at SSRN 3831424. 2021. https://doi.org/10.2139/ssrn.3831424
  4. Farhoodi M, Toloie Eshlaghy A, Motadel M. A Proposed Model for Persian Stance Detection on Social Media. International Journal of Engineering, Transactions C: Aspects. 2023;36(6):1048-59. https://doi.org/10.5829/IJE.2023.36.06C.03
  5. Cath C. Governing artificial intelligence: ethical, legal and technical opportunities and challenges. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2018;376(2133):20180080. https://doi.org/10.1098/rsta.2018.0080
  6. Kornilova A, Eidelman V. BillSum: A corpus for automatic summarization of US legislation. arXiv preprint arXiv:191000523. 2019. https://doi.org/10.48550/arXiv.1910.00523
  7. Saraswat N, Li C, Jiang M. Identifying the Question Similarity of Regulatory Documents in the Pharmaceutical Industry by Using the Recognizing Question Entailment System: Evaluation Study. JMIR AI. 2023;2(1):e43483. https://doi.org/10.2196/43483
  8. Amur ZH, Kwang Hooi Y, Bhanbhro H, Dahri K, Soomro GM. Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives. Applied Sciences. 2023;13(6):3911. https://doi.org/10.3390/app13063911
  9. Fradelos G, Perikos I, Hatzilygeroudis I, editors. Using Siamese BiLSTM Models for Identifying Text Semantic Similarity. IFIP International Conference on Artificial Intelligence Applications and Innovations; 2023: Springer.
  10. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018. https://doi.org/10.48550/arXiv.1810.04805
  11. Silic A, Saric F, Basic BD, Snajder J, editors. TMT: Object-oriented text classification library. 2007 29th International Conference on Information Technology Interfaces; 2007: IEEE.
  12. Reimers N, Gurevych I. Making monolingual sentence embeddings multilingual using knowledge distillation. arXiv preprint arXiv:200409813. 2020. https://doi.org/10.48550/arXiv.2004.09813
  13. Artificial Intelligence and Legislation. In: Center MR, editor.: Islamic Parliament Research Center of The Islamic Republic of Iran; 2018.
  14. Leskovec J, Rajaraman A, Ullman JD. Mining of massive data sets: Cambridge university press; 2020.
  15. Sadjadi S, Mashayekhi H, Hassanpour H. A two-level semi-supervised clustering technique for news articles. International Journal of Engineering, Transactions C: Aspects. 2021;34(12):2648-57. https://doi.org/10.5829/IJE.2021.34.12C.10
  16. Hassanpour H, AlyanNezhadi M, Mohammadi M. A signal processing method for text language identification. International Journal of Engineering, Transactions C: Aspects. 2021;34(6):1413-8. https://doi.org/10.5829/IJE.2021.34.06C.04
  17. Rao KS, Murthy D, Kancherla GR. Semantic similarity based automatic document summarization method. International Journal of Engineering and Advanced Technology (IJEAT) ISSN.2249-8958. https://doi.org/10.35940/ijeat.F8566.088619
  18. Hosseinikhah T, Ahmadi A, Mohebi A. A new Persian text summarization approach based on natural language processing and graph similarity. Iranian Journal of Information Processing and Management. 2018;33(2):885-914. https://doi.org/10.35050/JIPM010.2018.084
  19. Wang B, Liu W, Lin Z, Hu X, Wei J, Liu C. Text clustering algorithm based on deep representation learning. The Journal of Engineering. 2018;2018(16):1407-14. https://doi.org/10.1049/joe.2018.8282
  20. Dang S, Ahmad PH. Text mining: Techniques and its application. International Journal of Engineering & Technology Innovations. 2014;1(4):22-5.
  21. Jiang Y, Zhang X, Tang Y, Nie R. Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Information Processing & Management. 2015;51(3):215-34. https://doi.org/10.1016/j.ipm.2015.01.001
  22. Karaa WBA. A new stemmer to improve information retrieval. International Journal of Network Security & Its Applications. 2013;5(4):143. https://doi.org/10.5121/ijnsa.2013.5411
  23. Kamyar H, Kahani M, Kamyar M, Poormasoomi A, editors. An automatic linguistics approach for persian document summarization. 2011 International Conference on Asian Language Processing; 2011: IEEE.
  24. Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:190810084. 2019. https://doi.org/10.48550/arXiv.1908.10084
  25. Le Q, Mikolov T, editors. Distributed representations of sentences and documents. International conference on machine learning; 2014: PMLR.
  26. Haveliwala TH, Gionis A, Klein D, Indyk P, editors. Evaluating strategies for similarity search on the web. Proceedings of the 11th international conference on World Wide Web; 2002.
  27. Pennington J, Socher R, Manning CD, editors. Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014.
  28. Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Advances in neural information processing systems. 2014;27. https://doi.org/10.48550/arXiv.1409.3215
  29. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013. https://doi.org/10.48550/arXiv.1301.3781
  30. Khosrovian K, Pfahl D, Garousi V, editors. Gensim 2.0: a customizable process simulation model for software process evaluation. International conference on software process; 2008: Springer.
  31. Hossain MZ, Akhtar MN, Ahmad RB, Rahman M. A dynamic K-means clustering for data mining. Indonesian Journal of Electrical engineering and computer science. 2019;13(2):521-6. https://doi.org/10.11591/ijeecs.v13.i2.pp521-526
  32. Yi J, Zhang Y, Zhao X, Wan J. A novel text clustering approach using deep-learning vocabulary network. Mathematical Problems in Engineering. 2017;2017. https://doi.org/10.1155/2017/8310934
  33. Navigli R, Velardi P. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE transactions on pattern analysis and machine intelligence. 2005;27(7):1075-86. https://doi.org/10.1109/TPAMI.2005.149
  34. Floridi L. The European Legislation on AI: A brief analysis of its philosophical approach. Philosophy & Technology. 2021;34(2):215-22. https://doi.org/10.1007/s13347-021-00460-9