php - Regex for words connected by hyphen and underscore while keeping punctuation -
i have been reading, searching , trialling different ways write regex such p{l}, [a-z] , \w can't seem results after.
problem
i have array made of full sentences punctuation, parsing through array using following pre_match, works in keeping words , punctuation.
preg_match_all('/(\w+|[.;?!,:])/', $match, $matches)
however, have words these:
- word-another-word
- more_words_like_these
and able retain integrity of these words (connected) current preg_match breaks them down individual words.
what tried
preg_match_all('/(p{l}-p{l}+|[.;?!,:])/', $match, $matches)
and;
preg_match_all('/((?i)^[\p{l}0-9_-]+|[.;?!,:])/', $match, $matches)
that found here
but cannot achieve desired outcome:
array ( [0] a, [1] word, [2] like_this, [3] connected, [4] ; ,[5] with-relevant-punctuation)
ideally able account special characters of these words have accents
just insert hyphen character class. note hyphen needs appear @ beginning or end of set of characters. otherwise it'll considered range symbol.
(\w+|[-.;?!,:])
examples
live demo
https://regex101.com/r/yi3tm4/2
sample text
however, have words these: word-another-word more_words_like_these , able retain integrity of these words (connected) current preg_match breaks them down individual words.
sample matches
the other words captured before, words hyphens captured
omitted match 1-9 brevity match 10 1. [39-56] `word-another-word` match 11 1. [57-78] `more_words_like_these` omitted match 12+ brevity
explanation
node explanation ---------------------------------------------------------------------- ( group , capture \1: ---------------------------------------------------------------------- \w+ word characters (a-z, a-z, 0-9, _) (1 or more times (matching amount possible)) ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- [-.;?!,:] character of: '-', '.', ';', '?', '!', ',', ':' ---------------------------------------------------------------------- ) end of \1 ----------------------------------------------------------------------
Comments
Post a Comment