zhiqingstudy

Be a young person with knowledge and content

In our daily work, we sometimes encounter the situation that some Chinese sentences need to be broken through punctuation marks. The effect is shown in the following figure.

Python implementation of punctuation break sentence method

How can I break a sentence by punctuation? As shown in the figure above, if you want to get the two sentences "Python Basic Tutorial" and "Python Introduction Tutorial (very detailed)", you need to break the whole sentence with ",". How to use python to implement?

Idea: replace the punctuation marks that you want to cut into the whole text with fixed marks, such as "-", and then obtain different sentences before and after the "-" to achieve punctuation punctuation.

Knowledge points: Focus on the use of python's re module, the use of regular expressions, and file operations.

The codes are as follows:

from pathlib import Path
import re # Import required modules
p1=Path( ‘1.txt’ ) # The original file path. It is recommended to use the same relative path as the program directory
with p1.open( ‘r’ ) as file: # Open original file
article=file.read() # Get the text of the original file
mark=[ ‘?’ , ‘,’ , ‘-‘ , ‘|’ , ‘_’ , ‘–’ , \n ] # The punctuation marks you want to cut are stored here
for m in mark: # Traverse the file and replace the punctuation marks you want to cut with “-”
for n in article:

ifm==n:
article=article.replace(m, ‘-‘ )
regex=re.compile(‘[^-]+(?=-| \n )’ ) # The regular expression matches the sentence and passes the “-” Symbols break sentences
r=regex.findall(article)

t=set(r) # Store the obtained sentences in a set and remove the duplicates
with open( ‘2.txt’ , ‘w’ ) as newfile: # Create a new one txt File used to store new sentences
a = (i for i in t)

for x in a:
newfile.write(x \n )

The above code can be used to break sentences with punctuation marks. The specific effect is shown in the following figure.

Python implementation of punctuation break sentence method

Improvements: Sometimes short sentences like "Baidu Encyclopedia", "Rookie Tutorial", "Zhihu", etc. are not what we want. This can fix the length of matching sentences in the regular expression to filter out sentences with fewer words.

comment
head sculpture
Code:
Related

Why you shouldn't stay at a job for more than 2 years?

3 harsh facts long-distance relationships

how to keep your girlfriend interested in a long-distance relationship




Unless otherwise specified, all content on this website is original. If the reprinted content infringes on your rights, please contact the administrator to delete it
Contact Email:2380712278@qq.com

Filing number:皖ICP备19012824号