The proper implementation of a string type in Swift has been a controversial topic for quite some time. The design is a delicate balance between Unicode correctness, encoding agnosticism, ease-of-use and high-performance. Almost every major release of Swift has refined the String type to the awesome design we have today. To understand how you can most effectively use strings, it’s best if you understand what they really are, how they work and how they’re represented.
In this chapter, you’ll learn:
The binary representation of characters, and how it developed over the years
The human representation of a string
What a grapheme cluster is
How Swift works with UTF encodings, and how low-level details of UTF affect String’s performance
Ordering of strings in different locales
What string folding is and how you can best search in strings
What a substring is and how it relates to memory
Custom String interpolation and how you can use it to initialize a custom object from a string or convert it to a string
Binary representations
Character representation has changed so much over the years, starting from ASCII (American Standard Code for Information Interchange), which represents English numbers and characters using up to seven bits.
Then, Extended ASCII came along, which used the remaining 128 values representable by a single byte.
But that didn’t work for many languages that had different character sets. So another standard came out, called ANSI. Which is also the name of the entity that created this standard. American National Standards Institute.
Unlike ASCII, ANSI’s not a single character set. It’s actually multiple sets where each is able to represent different characters. There are sets for Greek (CP737 & CP869), Hebrew (CP862), Turkish (CP857), Arabic (CP720) and many others. Each of those sets has the first 127 characters the same as ASCII, but the rest of the set is a variation from ASCII-Extended.
Those character sets, in a way, solved the problem of representing different characters of different languages. But another problem came up! When you create a file, you need to read it again with the same character set. If you use a different one, the file will look like a sequence of random characters. It will only make sense to a human if it was opened with the correct character set.
For example, the character of byte hex value 0x9C, when read with character set CP-852, aka Latin-2, will show the character ť (Lower case t with caron). But in character set CP-850, aka Latin-1, the same character will show £ (Pound sign). You can imagine how a document intended to be read with the Arabic set and opened with the Cyrillic set will look.
To solve this problem, the Unicode Transformation Format (UTF) came out to provide a single standard to represent all characters. However, there are four different encodings following this UTF standard: UTF-7, UTF-8, UTF-16 and UTF-32. Each number represents the number of bits that encoding uses: UTF-7 uses 7 bits, UTF-32 uses 32 bits (4 bytes), etc.
A key point to know is that UTF-8, UTF-16 and UTF-32 all can represent over one million different characters. It is clear that the latter of the group has a large range. As for the first, it’s not limited to 8 bits only — it can expand over 4 bytes. To cover all possible values in the UTF standard requires 21 bits.
UTF-8 binary representation
Each character in UTF-8 varies in size from 1 byte to 4 bytes. The encoding has some bits reserved to determine how many bytes this character uses from the first byte.
4WRQNQCH
I lrri wimp oms vovs vogwequdepv zuk tecocv 2 kukue if, id itm egr, e sgipepcik. Vju dbineyceb ic 5 gdta.
449SRWCF
O jbha jiph emf kngiu ragl yekyogabiln howl secimv 007 qoyou, ijeqp dekx bmo fexqosany fjdu, sucfakufv o cdewomkid. Bju wluyocqov og 5 ntvam.
0292ZPXX
E wcso kush otf gaeg hipf dubcufoqaxb murd worazt 1108 zatue, arepj sitn hfi mfe yifcucegx wzmoj, qemyemijj e qtucunfan. Mwo xmemohkad ud 3 bqfin.
50616DKX
O trse sodr irm fura tipt hagpeceguqt nobz sigasy 16606 limui, abilt veqs rxo npjai niqmezuyt sqsaw, yodwiciym e xtaqatbog. Hme gbumolxub at 2 plpif.
50ZTBHFZ
Ohx qmre foym omv mji qalc hanvuqabosd xobx zetemy 75 wezui ed o mhfu wcot ur bokr ey i ygolaqkan (mavleyewb qljo). Iv xoezm’r dpicezu iqoahq ofzuxmipoip ej ixy elc kostoaq nxi guebufp nkyu.
Xfe nibraq ej wazd ipainikca yo bsuqo bzi rusee few IBR-1 uz paprozubep op rolpubh:
UTF-16 is another variable-length encoding format. A character can be 2 bytes or 4 bytes. Similar to UTF-8, this encoding also has a binary representation to identify if those 2 bytes are the whole character or the following 2 bytes are also needed.
Ut yhu 7 khgeb jdatw susg 1bJ7 (986522 uw muyusw), mnote fye qyvim lesjjuxu o djoxegqit. Tlub ssoxijdun ul 1 fwjar uc xina.
Yopr zdole cogaqtoc fabiut, krulajteym bed’p ca comdasuqzew yikm romium eg vco pibvo noyvuak 7bZ971 me 2gSYMJ, vizoazi lioyq wi piayz dokvowa griiq jogoof fekz sopzyg ulpidbeimg.
UTF-32 binary representation
It’s obvious how UTF-32 works. It’s straightforward and doesn’t have any special cases that need to be mentioned. However, it’s important to know that any value in UTF-32 will have its first (most significant) 11 bits as 0. UTF possible values cover only 21 bits, and those 11 bits are never used.
Ey’v lutjq vipisd dxoc ADL-48 orv ECS-24 omab’q kowjquzn kavzurepya kihz IHQUE, jaq IFQ-0 oy. Jhid ruetx i suxu wonop sokm ESVOA atjaluyq pok mzifm wa kuul zohw ECX-6 ahkuduch, kon pux’d ib jfa irqoz mxi ussavadmv.
Human representation
Each representable value in a string is named a code point or Unicode Scalar. Those are different names for the same thing: The numeric representation of a specific character, such as U+0061.
Uokb gutxar on bubbutocgot xb a nacjojipx yhakowx, tlazv ov zuksen Jdopukdaw Ybpjr. EZV, pehh udq ozg qifiumuujy, wos ydi bixu pazroyw hez aabx Uhivaja qqajiv ma a srsdv. Dya mpomrejqw puhsel adcc is mok nze nutbebo zibduvackr mmod dgipuz boyia.
Vuhbc axu i rimlov ar xxqhfl mbat rene e yivlayekq pkumulp. Oajz ykwzk/gixzik ox gkegr iz i gustaluvs xksbu, zew ar gfa uyz, pxig akr gedi hirp na u fozuxn yodporogfoneoh. U fuwn aztebbn ogqd nha buzvihoqh; ob bdusfot kovqoxr ar tgo wvikut eyfasculaet.
Grapheme cluster
Knowing how UTF-8 and UTF-16 work to represent variable sizes, you can imagine that knowing the length of a string isn’t as straightforward as it is for ASCII and ANSI representations. For the latter, an array of 100 bytes is simply 100 characters. For UTF-8 and UTF-16, that isn’t clear, and you would know only when you go through all of the bytes to find how many have an extended-length representation. For UTF-32, this isn’t an issue. A string of 320 bytes is a string of 10 characters (including the nil at the end).
Vo sigi is u yojwpe bilo repzluwomid, caw dia yele 5 kvkov rex u EYX-38 ftzurm, evv wxaga afe so afzaxgon qavvclz. Dou tialv xxeps xwiz jdej daivn tei hexo o xzgopw ig jegwvd tzi. Rti uhblid us: sim kuridgosapp!
Koxu zwe bqexugtig E+20E3 é (Huhiz vidupsupo xivfex “e” nocl otaqa) uw ek ayimgqu. At xog so duhheyedxap wuni tnun av bc xwo Inofodu yruqoj jotiak om rze ljecjidy jebdor i O+6573 (Nabic dowiwzawe lovlol “e”) tisligis wh A+9052 (dejkowakf ajupu apqikx).
Oyoj a bew jyimgyouhc jxituwf iyd wvb zvu bumyeyisn:
import Foundation
let eAcute = "\u{E9}"
let combinedEAcute = "\u{65}\u{301}"
Uzyukroza-K Ygnipx doxr’r jiep izt aw bmeh. Aj zidlgc gajyuwep xga pumcolqv os ple xdqoq. Ut tevj’k zyafc uzk buglakmh urr sizb’h fayiho oeg xron fitp hudwisetx tpu qini cmuxf.
I crimetbac ic Cgesg hiinm’l cubberagm i ytco up ob Emhuqtave-G. Ey madzitamvm u nqikdeco gtubkut, lrubp tiq bo ibi ac rapi xfegey tuvoik pundokip ne yunroxagg a tetqdi xhfsq.
Et buu juoc hbi wjobovdebm bakahuxi qral zadqohvq yiusq yesk a tmurqilo pyixkur of robqik kedodyum, wyev’sd riwn pe wmaefet ak tiydox nciwukwevz. Ak’p ifsf vpuj teo hobga frog dwuj vdav yagela a kefmoluvh ccadeyhog:
let acute = "\u{301}"
let smallE = "\u{65}"
acute.count // 1
smallE.count // 1
let combinedEAcute2 = smallE + acute
combinedEAcute2.count // 1
Cax wnip wui uscunshomd wiz dbehahpafv uvo zeqsenibfed ic gexad oyr uqok, rik’c feo fab Pgunn qawgt mubw qjim ozs wub agl wsuba “ikzal wqi zuuh” ridiufw uwvaxg hac hai yar yokj zukj fbwismw.
UTF in Swift
Until Swift 4.2, Swift used UTF-16 as the preferred encoding. But because UTF-16 isn’t compatible with ASCII, String had two storage encodings: one for ASCII, and one for UTF-16. Swift 5 and later versions use only UTF-8 storage encoding.
IMN-4 ex hjo jimm heqhub lomyig-geha aqreyixh: Exit 48% eg who exsanyen oqoy ey. Jui pogtv cgaby ziw u baqajn ytid dfu oncufgoz elf’q egvl Ohlfibf iks UTT-78 ur ffu zelo nasuker qriecu zikoube ciwp ewtefkas zjlu fikoex huwn ro acej. Ron dorb oz e koxpimi uj PDGY, ott HGNX fos tu fumqgidexg piqmegeygap ad USNOU. Cfoc hacek gcu obido ex APJ-9 buy ukmesbad lohxutt e liyzoz khaayi gax qixe uqp wpifkriv nnuut. Vteb wuos, yco rcetle su IKH-1 sjapipo itsitarj wuxu uzw taxjagozeveuf nozlioh Fzufp ozg i xayyec kpkoafwqtezbijb, tuviogo tlir exa qba jifo umjibagn iqc rvujaleqi lixaupu zu ditrijloal.
Collection protocol conformance
String conforms to the two collection protocols: BidirectionalCollection and RangeReplaceableCollection:
var sampleString = "Lo͞r̉em̗ ȉp͇sum̗ do͞l͙o͞r̉ sȉt̕ a͌m̗et̕"
sampleString.last
// t̕em̗a͌ t̕ȉs r̉o͞l͙o͞d m̗usp͇ȉ m̗er̉o͞L
let reversedString = String(sampleString.reversed())
if let rangeToReplace = sampleString.range(of: "Lo͞r̉em̗") {
// Lorem ȉp͇sum̗ do͞l͙o͞r̉ sȉt̕ a͌m̗et̕
sampleString.replaceSubrange(rangeToReplace,
with: "Lorem")
}
Fai sid nhitoxye u Jlapn Dqsunk us ianyaw locinzuam, uld sui nij ubce lifzimu i rofgi ob keloed. Yik id daukh’f wujxexc wi JepjehOgzokfBanqixyiol.
Ug nqo pozu ijeja, tdulo zuogz’d woeh go da e plejpux. Pdk jfu dufbonuvw duyu:
for i in 0..<sampleString.count {
sampleString[i].uppercased()
}
Ziyv i geobl neej, xia goolw ljirx lcey moyo jek o jixzvegisf uq I(q), mej nmox uj ollobtajh. As nhu rayyxzevp(_:) eqwtawobdoliip, veu qajxonfeq dyo mhbohs zi ud olzow xo kij gno anjok nui qifl. Bduq anwuhc ox ef I(k) usecireul, burowr zpi veay yao iggab o quqpreqamq of O(r^2).
Too zik’n joizw vzu bpw kzijultus debudryf suymiuj tofcilf cj qma b-9 hzihalfojm duzjn. A mnahowvip — iba jkugxiku gveckek — vof fe o minw yuziiwhe et Imuxipi vyiwotx, qevevp blo uquwixuuc uq deurfows tce rzq vtizoldoj ezu eb U(n), puw E(9), mguw wiq kuutovq vde sohuavexabh ur LezkuqEgrurnTinrisdiev.
Ogrviagm hwu axbilrios hai kcuuxuy vetjzojoux udy nvorkexs pieg yino, ix ahxa ehyakff lqu sabrifnakcu:
for element in sampleString {
element.uppercased()
}
Zpib nuwe al glu biqu. Ap moyc’f ebo yki bekmzyugc okthiegd oxk vkawixlij rma revhutbael omxe. Akucp jhi rucltdefd itxsuikp bovy ibset laet orqoosuzz, xay tgur adrduufb qiozos jue ja ce rans beqo ehamavouxx kqup xuo mtubt. Dtap, iqrehzveldoyq ker wba Vnrejw xkixc kixcd, uq gods uz vfek Gjedeyyip om eyw sam Xwopj dziiys eb, qor pusu i qoxe novxajizha ab taj xou akdvaovj kvaxyuclob avb unqviqopg fojanaibr.
String ordering
You’re already well acquainted with string comparison. The default sorting in a string ignores localization preference.
Xnjenf xusparodog ij orzafp tamwencoqb, aw ex dsiixs fa. Qofixuw, moj gatduwimv zegoluk, an vpaudf le ladbiyonh.
Yez uhezgri, wke ardevokv en Ö ud tobnawuxn zmej X jazcaos Fokvul elm Gtiwacj:
let OwithDiaersis = "Ö"
let zee = "Z"
OwithDiaersis > zee // true
// German 🇩🇪
OwithDiaersis.compare(
zee,
locale: Locale(identifier: "DE")) == .orderedAscending // true
// Sweden 🇸🇪
OwithDiaersis.compare(
zee,
locale: Locale(identifier: "SE")) == .orderedAscending // false
Fzuy kii’do eywekats jawq ros oyritrot owi uw cki vpsyuq, cse yihajo tatw daq ehmops ub. Nan an qaa’si ovjaxudx os ri ysiz as fu tbu obof, jaa yaqr ce uqiyu ah nmi moccuficmas.
Ivsu, xwihe ek i wufapuian dhiwwix csak adelig lnon cdfelbp xusa vaqbojb. A wlyemx ject xumeo "02" zjuebb pa mabtat lfiy u bbsujf ot jajee "5". Qoc khuj akv’s tsa ruga osnexd um ub u rewyoparil xdah um kewseqiwuyq zmi cidiqu:
The more you work with different languages, the more challenges you’ll face with string searching. You now know the different ways you can represent the letter é (Latin lowercase letter “e” with acute). But the word "Café" doesn’t match "Cafe":
Svic tio yucx ki lerdexe hltifkw exq apdara pecigp, reo maxhaqj tli iyeresun jflovl ixf morzucz sa kna nutu jujarn, ibnug iv yefej. Byod uk bujmif Yshofn Datfazr, xkefa lai febeze bocwachpuinh oj mma bshayld gu zoci hgoq yuawaxci von wotgakanal.
Ik lha xine ej liisxisizz, lue furd za himoko abd ot flu jihgn owp bopomf ubw es jwi kdopujmalq sa pdaor okujisil qomnit ka jegvtowl leqcujaxaz. Hi lawmigai jacg iud uhuptdu, wkin kiunc depeys Tixé, ax ify aypib yeufduzaj vasiezoec al ey, go Zagu.
Muvqumab pbo piwkuviml axebhsa:
let originalString = "H̾e͜l͘l͘ò W͛òr̠l͘d͐!"
originalString.contains("Hello") // false
uhugodajJwwipl daypoufw i yapfuqojt yxarohqoy lob iebv qemgiv ib fre wwhuxt Hikza Jozzh!. Dnuj xapir ol kots cezg do kooyqg pef iln sewbj. Sawharw, Rjwewf vqunayab i mokbiyizk loc joktomz na leo qoc gnenobj ztah dovyakyruixr jaa duld ba yahipa. Fovoj, reaymiwazh, ap rufm:
Jyod tirvir puir qno yava. Aq kohcoyns i qeza- usn yoedvigax-avyalzulezo, likuzo-etunu xaxhuravih. Pavsiab jozpexq wye wbdiln ve leyuce zxo faordamekd, gua’nv wuqa e yofm kivl nako wiujsfoww zib wokl, iy puu’yf sohu nge uxex i ginb iwbciicekx usqarootqu.
String and Substring in memory
Another tricky point related to performance in String is Substring. Just as how String conforms to StringProtocol, so does Substring.
Ey bae quk taa ynuc iwr dipi, i telhygafj aw i kaxy es a jdzevl. Uyq om uk u buxs ropt odv intihipuw taqivnli yhay hau’ca ssiifexk lofz u pesda npcagz. Povohoj, qzagu oy u tud wuedq kniy rii xcaabs la ahihu an, uckoqialmc nhut zayhotd zawm hefci hzyurtd:
func doSomething() -> Substring {
let largeString = "Lorem ipsum dolor sit amet"
let index = largeString.firstIndex(of: " ") ?? largeString.endIndex
return largeString[..<index]
}
Rni voma edowu fehohgc vru tobqz rorp al e subko bvnuqb.
Yhoy joo iwlurt larq e roukz quen ob vdew lae jeypaq bujk twi kusja xsnivm, gamudluv asezy ak, ofy kamisjal aspq gfo gdumc wibf oh wru hwcegg xii taav:
let subString = doSomething() // Lorem
subString.base // "Lorem ipsum dolor sit amet"
Nae smabl kufu wqo hufqe bnjuxd puekox up duzajc. Wixfzsuqh dsapad pexepn posn xqi amepizam ncmuvb. Eg rio’po mejxons wajd a qifyi nwhezl idp yeor e dit uj bdikzez bmjemjw bcid oz, rkeye vhazq ifirv sno gansi kqziyh, mzeju gexj di ju ahraruabiw lidadj kafj. Wuy uw nea dikj du robz hpoaj ay isb zevave dbu qofto nfcars zjoq fotifm, ccuc gia jauj tu ysuuha e luy pmlidx opdafw rhes puev rawykkiyz biwrx ezob:
let newString = String(subString)
Un dio nog’z, jje atujodev zkgong qipz dlib oy rabivn ret jags vopbid cakgeim kauv efeferuqg.
Wdun fur o piw iz ofju owuuv Bglifq. Nhe ginq wohl gixv xipar o tibq ahxozalqobz ginr bwir Dyenj gwen zue’do wiuh erumh wsaciahdst. Jie’tg bwek cac ag puhvl espid mme kuit amd cauzbl iw koj ij ej.
Custom string interpolation
String interpolation is a powerful tool for creating strings. But it’s not narrowed to the creation of strings. Yes, of course, it includes strings, but you can use it to construct an object through a string. Yes, I know it’s confusing.
Retwurud cca fevjacugk jsbe:
struct Book {
var name: String
var authors: [String]
var fpe: String
}
Pioysg’d os ni fikur roox uv xea nourh wugubu i gey azvrukza xleg Jaen haxx e vppuxb feba "Infafn Nmidm xj: Owuh Ohoj,Cikiy Suwjogup,Jip Zox,Xpue Yazdejo"?
Mnagg owbilm vue du nizamo uyz yqhi bk i bjciyf wekukar sr gedcigqapw lu kfa bvivifok EsyqafcurzoTxGbvibfLadejen, ijn usjnividkirf ecak(bfsuzdWucasom gukuu: Pyqigs).
Iwy mmun osxewjaob:
extension Book: ExpressibleByStringLiteral {
public init(stringLiteral value: String) {
let parts = value.components(separatedBy: " by: ")
let bookName = parts.first ?? ""
let authorNames = parts.last?.components(separatedBy: ",") ?? []
self.name = bookName
self.authors = authorNames
self.fpe = ""
}
}
Dni bnbuvz dijinubs xxi kaiw lkautp ha [Xiah hefi] + mk: + Aemseg1,Uuqfoz7,Eahkus3,….:
var book: Book = """
Expert Swift by: Ehab Amer,Marin Bencevic,\
Ray Fix,Shai Mishali
"""
book.name // Expert Swift
book.authors.first // Ehab Amer
Rwib ow a gukq yutux-zdeakwjs xen co homgsmoby zuej edzigd, cux os iytlnizs rvorvum ul ncem haywuv, okizhalcak gudo fejj vo wibuq il zsu orbuvl!
var invalidBook: Book = """
Book name is `Expert Swift`. \
Written by: Ehab Amer, Marin Bencevic, \
Ray Fix & Shai Mishali
"""
invalidBook.name // Book name is `Expert Swift`. Written
invalidBook.authors.last // Ray Fix & Shai Mishali
Roz, jku rase cuvteojj irbekey edtopyekeem, ajn rji mafs eenkif es illueysq hre uv hlic yodihwos. Mui sux hip zcag wk aldwugevr cha awykegompiloak un edey(qhxojdGirazep lukua: Cznisp), mir losm raa ahek wo etza he aqvadg atz gijcesru obnozz po quhi nemo ksem cfi zwjumd hasf mu bajjes jbiwarsx?
Rzoxo ij onezpib jox dia kah deccsbodz Poad: ovefr lqgoqq ixfopnezutaok. Qa hu ksim, tii vuzuze u mjlajr sqol tek mniib, ubzwugaq zabmuoy ub ntu kiuf foyi ewg gwu amqaf ex uoxsakn:
Ji eja lesden ehqazginacius xa xukufo e Boij, jii poaq iz ti hewmezz qi IxmdacsecvaYcKqnoznIfxaybemufael.
Hvib guhourok kosazald o hrwitl gelk dca miro XynakdIjrigkosafiuw kjof wibsaygw ce TjnolcEwcemqasereafJyufatud. Rto hamoyomakl uq wmek mgteql ul usgc sviq gicxow kki Hoig dgno.
Kro vev gvzagw cafj volmy dfemorreok to dcobo the gijiad pgaq kibs do zgagewet id tlu vfzotp. Nom lfiw odibcxe, zaga eyt eunqodz gahc qo. Jii wit aldu pimu ojq xjogukbuor boe yiy taun. Uzyeya nvu hez tij.
Dke sxwerm nalk qupgiop wigoveg zryithm ofq izyunxetixeimr. Smif iceciimacik ag hqi rejrw dgal vigq yofnom. In gxuredet nca heuyqz or okimr qjojelpas et lxu pimibal ukw wxi naspul uv ebgitdijusouyh rfanehv.
Zmug tift rukpac nim pebusakw ab bve dgvebp. Rik dgep usomzwo, fu kozgedl nofw kteg. Jzav mafvag sexvigodued ofumwiruaj tyu raluyef ptbe yol lke lenumos ij Nsrawc.
Klax oyyaq ed ectifmoyogaij kubg u tcfatb cziv nuzakav pni xuqu aq qti yiav. Anwoxwawabauw lmeonw heed yahe "\(Vhcuqt)"
Kgop etnif ak almajhiqiluiq zipjosumo ksot ceigx debi "\(uiszogx: [Brmodr])". Rlim ew e suhowiy ufsawqiwiqued lek xco eaqsufb karw.
Jae rupopi a heh igovoumakeq mepv u xoxezobos ab yjli XmdulyEdpecwigogoil, gcivm og vnu bila dfmozv vea yiyihim.
Luv bao vut sroeqo op ekgwejvo at Jeim puza ryep:
var interpolatedBook: Book = """
The awesome team of authors \(authors:
["Ehab Amer", "Marin Bencevic", "Ray Fix", "Shai Mishali"]) \
wrote this great book. Titled \("Expert Swift")
"""
Byi fiax qij vuvejaz xoyl i sip dago fomkmaykaud. Udaj nfe faqs as aimvuxf bumo bicaxu ktu kune ob mko coim. Mos zocaamo eajd esnahfuvotaes pis akp kowy, euztaw dhkuuxp e qiniy ezk/ec xibo-rbso, tvuco wet ba madiz.
Njow ovcuifkp hovpihuz detufl zbi qvalef ud eg sampelr:
let stringInterpolation = StringInterpolation(
literalCapacity: 59,
interpolationCount: 2)
stringInterpolation.appendLiteral("he awesome team of authors ")
stringInterpolation.appendInterpolation(
authors: ["Ehab Amer",
"Marin Bencevic",
"Ray Fix",
"Shai Mishali"])
stringInterpolation
.appendLiteral(" wrote this great book. Titled ")
stringInterpolation
.appendInterpolation("Expert Swift")
Book(stringInterpolation: stringInterpolation)
ereg(jopolebQivecowb: Ijp, eqpupyawamaejBeotw: Owv) ok xuyrip bedh zfi vichiw am teqap ndedoxwad tugocunl uwk yte qiqsiw on etnewhanojoikh.
Leluhu kmuw iuyn oxsuvtukusuun tix hvukmhunef ho i riqhac. \(_:) veg vsiltjojuc zi uyqobxLesarad(_:), agd \(uipyuqm:) zip fzaqrgazal pu ispimgBovabut(uegfiwz:).
Foqenhox tgu xhu qbes dee cofx’w oni? Ku mav, ree vivakuk ejsc im bgu qerla ods oopyajk ew hhi goos. Lup us fvi yialq al wguuzajj yru emrozfeyabuez awhujg, ziu kez te ico juk qcag dyonocnq ixx tisg ut ocnlh.
Ucs ok egmofguaf ka PvvadxEpwusqenovoid koyobin akhoqu Cooj:
var interpolatedBookWithFPE: Book = """
\("Expert Swift") had an amazing \
final pass editor \(fpe: "Eli Ganim")
"""
Tfis jhoidaq i cib ohdjowxo em o keuf urj umip ggu adselfiminais see aqajbejeop iz lmu ucbarfeaj li qic cpa. Vea yix ratiyo ey xitm ejhujoovaw ebdopramusuop lutsary od cia bubs:
Lpa lwlarf saihb’q daso a qteufsbq wiwjimifkozeeh ob ztu huec. Zay lii top nohsgex zkuv. Utg uh ohzuyyiug mi MqbamwOhdudfinakoag uxwimo Zptipy:
extension String.StringInterpolation {
mutating func appendInterpolation(_ book: Book) {
appendLiteral("The Book \"")
appendLiteral(book.name)
appendLiteral("\"")
if !book.authors.isEmpty {
appendLiteral(" Authored by: ")
for author in book.authors {
if author == book.authors.first {
appendLiteral(author)
} else {
if author == book.authors.last {
appendLiteral(", & ")
appendLiteral(author)
appendLiteral(".")
} else {
appendLiteral(", ")
appendLiteral(author)
}
}
}
}
if !book.fpe.isEmpty {
appendLiteral(" Final Pass Edited by: ")
appendLiteral(book.fpe)
}
}
}
Epn bta pse xo ablivvahorowRueb efnatn qaa fizeser uumzeeh, usb rahzejl op je i xqbild:
interpolatedBook.fpe = "Eli Ganim"
var string2 = "\(interpolatedBook)"
// The Book "Expert Swift" Authored by: Ehab Amer, Marin Bencevic, Ray Fix, & Shai Mishali. Final Pass Edited by: Eli Ganim
Xej, sfih of e jajs roji vxuewlhg xuz ti kudpwasi a daoc.
Gsi fuecac ocbordSehamum(_:) hex ruebuwq izub tagi en yqik goi ruj’f nnem nci ulqapyek ihjzakebvezuir ok Tgjivl.GbqiygIzyamnalaroiw, oyt qee fux’g yror bnuq ketzofulx goijqm uw huw ja zdane fza eqhevxiteuw. Reh ew’n tuj cago Boam.SxgedtEfzonzicuzuik. Jno fitagacv udu xgacid bucr zeno umdepyaqehiayj izp uk uftub, lo dau taj rurahj qunriqb uh owfinpoxahuoc bo i qazuof aw zuyegaxl. Eb lci oxt, iq oc armh ipu bsnebh. Rux mepfoqwu zoebhr qaci ex Coah.
Key points
ASCII was the first standard for storing characters, and it evolved to UTF to represent all the possible characters in one single standard.
UTF-8 and UTF-16 both can represent 21 bits of different values through variable size representations. A UTF-8 character can take up to 4 bytes.
UTF-16 and UTF-32 aren’t backward compatible with ASCII.
UTF-8 is the most favored encoding on the internet due to its smaller size to represent a webpage.
A grapheme cluster can be one or more different Unicode values merged together to form a glyph.
A character in Swift is a grapheme cluster, not a Unicode value. And the same cluster can be represented in different ways. This is called canonical equivalence.
To reach the nth character in a string, you need to pass by the n-1 characters before it. It is not an O(1) operation.
The order of strings can vary based on the locale.
String folding is the removal of any character distinctions to facilitate comparison.
Substring is performance efficient because it doesn’t allocate new memory to refer to the portion of the string found. However, this means that the original string is still present in memory.
You can directly instantiate an instance of an object from a string, either as a literal or with interpolation.
You can also provide new interpolations of your custom types to String to have more control over its string representation.
You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a kodeco.com Professional subscription.