Column Analysis

Get a compact, numerical summary of all analyses carried out. Visualizations and key figures provide a quick overview of key results and developments-ideal for evaluation at a glance.

Constant Object
Column: epitope_type
Number of Values: 99999
Number of Unique: 1
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
Linear peptide 99999 1.0
Statistic Value
Unique Categories1
ModeLinear peptide
Entropy-0.0
Gini Coefficient0.0
Simpson Diversity1.0
Min Category Length14
Max Category Length14
Memory2999970
Cardinality Ratio0.0
Sequence
Column: peptide
Number of Values: 99999
Number of Unique: 18293
Number of Missing: 0 (0.0%)
Number type: str
Sequence type: protein
Top 20 Protein Sequences Analysis
Rank Sequence Count Length (AA) Frequency Molecular Weight (Da) Isoelectric Point (pI)
1 FLPSDFFPSV 48 10
Asp 10.00 %
Leu 10.00 %
Phe 30.00 %
Pro 20.00 %
Ser 20.00 %
Val 10.00 %
1155.3 3.75
2 FPRCRYVHK 48 9
Arg 22.22 %
Cys 11.11 %
His 11.11 %
Lys 11.11 %
Phe 11.11 %
Pro 11.11 %
Tyr 11.11 %
Val 11.11 %
1205.4 10.46
3 SRYWAIRTR 46 9
Ala 11.11 %
Arg 33.33 %
Ile 11.11 %
Ser 11.11 %
Thr 11.11 %
Trp 11.11 %
Tyr 11.11 %
1208.4 12.20
4 ESAERLKAY 42 9
Ala 22.22 %
Arg 11.11 %
Glu 22.22 %
Leu 11.11 %
Lys 11.11 %
Ser 11.11 %
Tyr 11.11 %
1066.2 6.53
5 ILKEPVHGV 40 9
Glu 11.11 %
Gly 11.11 %
His 11.11 %
Ile 11.11 %
Leu 11.11 %
Lys 11.11 %
Pro 11.11 %
Val 22.22 %
991.2 7.55
6 FRLMRTNFL 38 9
Arg 22.22 %
Asn 11.11 %
Leu 22.22 %
Met 11.11 %
Phe 22.22 %
Thr 11.11 %
1197.5 12.50
7 RTLLGLILFV 36 10
Arg 10.00 %
Gly 10.00 %
Ile 10.00 %
Leu 40.00 %
Phe 10.00 %
Thr 10.00 %
Val 10.00 %
1144.5 10.55
8 IIIPFIAYFV 36 10
Ala 10.00 %
Ile 40.00 %
Phe 20.00 %
Pro 10.00 %
Tyr 10.00 %
Val 10.00 %
1195.5 6.09
9 GILGFVFTL 36 9
Gly 22.22 %
Ile 11.11 %
Leu 22.22 %
Phe 22.22 %
Thr 11.11 %
Val 11.11 %
966.2 6.10
10 ETIFTVLAL 35 9
Ala 11.11 %
Glu 11.11 %
Ile 11.11 %
Leu 22.22 %
Phe 11.11 %
Thr 22.22 %
Val 11.11 %
1006.2 3.85
11 FVIGGMTGV 35 9
Gly 33.33 %
Ile 11.11 %
Met 11.11 %
Phe 11.11 %
Thr 11.11 %
Val 22.22 %
880.1 6.10
12 WLSLLVPFV 34 9
Leu 33.33 %
Phe 11.11 %
Pro 11.11 %
Ser 11.11 %
Trp 11.11 %
Val 22.22 %
1073.3 6.10
13 KLVALGINAV 34 10
Ala 20.00 %
Asn 10.00 %
Gly 10.00 %
Ile 10.00 %
Leu 20.00 %
Lys 10.00 %
Val 20.00 %
997.2 9.70
14 EIPQFMIGL 34 9
Gln 11.11 %
Glu 11.11 %
Gly 11.11 %
Ile 22.22 %
Leu 11.11 %
Met 11.11 %
Phe 11.11 %
Pro 11.11 %
1047.3 3.85
15 EIINNGISY 33 9
Asn 22.22 %
Glu 11.11 %
Gly 11.11 %
Ile 33.33 %
Ser 11.11 %
Tyr 11.11 %
1022.1 3.85
16 EVFEIIRSY 32 9
Arg 11.11 %
Glu 22.22 %
Ile 22.22 %
Phe 11.11 %
Ser 11.11 %
Tyr 11.11 %
Val 11.11 %
1155.3 4.26
17 DTTTDISKY 32 9
Asp 22.22 %
Ile 11.11 %
Lys 11.11 %
Ser 11.11 %
Thr 33.33 %
Tyr 11.11 %
1043.1 4.11
18 GLYEAIEEC 32 9
Ala 11.11 %
Cys 11.11 %
Glu 33.33 %
Gly 11.11 %
Ile 11.11 %
Leu 11.11 %
Tyr 11.11 %
1026.1 3.47
19 LLTEVETYV 32 9
Glu 22.22 %
Leu 22.22 %
Thr 22.22 %
Tyr 11.11 %
Val 22.22 %
1066.2 3.61
20 WLWVSSSDM 32 9
Asp 11.11 %
Leu 11.11 %
Met 11.11 %
Ser 33.33 %
Trp 22.22 %
Val 11.11 %
1110.2 3.75
Length Statistics
Min Length:9 AA
Mean Length:9.2 AA
Max Length:10 AA
Molecular Weight Statistics
Min MW:880.1 Da
Mean MW:1077.9 Da
Max MW:1208.4 Da
Isoelectric Point Statistics
Min pI:3.47
Mean pI:6.42
Max pI:12.50
Hydrophobicity Statistics
Min Hydrophobicity:-1.211
Mean Hydrophobicity:0.675
Max Hydrophobicity:2.670
Charge Statistics
Min Charge:-3.06
Mean Charge:-0.10
Max Charge:3.03
Amino Acid Composition (Aggregated from Top 20)
Amino Acid Three-Letter Count Percentage Distribution
L Leu 23 12.50%
I Ile 20 10.87%
V Val 18 9.78%
F Phe 16 8.70%
E Glu 13 7.07%
T Thr 12 6.52%
S Ser 11 5.98%
G Gly 11 5.98%
R Arg 10 5.43%
Y Tyr 9 4.89%
A Ala 8 4.35%
P Pro 7 3.80%
K Lys 5 2.72%
D Asp 4 2.17%
W Trp 4 2.17%
M Met 4 2.17%
N Asn 4 2.17%
C Cys 2 1.09%
H His 2 1.09%
Q Gln 1 0.54%
Rank Sequence Length Hydrophobicity Charge Aliphatic Index Boman Index Aromaticity Instability
1 FLPSDFFPSV 10 0.810 -1.00 68.00 -0.24 0.300 78.50
2 FPRCRYVHK 9 -1.056 3.03 32.22 3.54 0.222 39.84
3 SRYWAIRTR 9 -1.211 3.00 54.44 4.65 0.222 -6.31
4 ESAERLKAY 9 -1.122 0.00 65.56 3.23 0.111 20.86
5 ILKEPVHGV 9 0.456 0.09 151.11 -0.20 0.000 52.13
6 FRLMRTNFL 9 0.211 2.00 86.67 2.32 0.222 -32.89
7 RTLLGLILFV 10 2.110 1.00 224.00 -1.51 0.100 28.26
8 IIIPFIAYFV 10 2.670 -0.00 195.00 -3.13 0.300 25.38
9 GILGFVFTL 9 2.267 -0.00 162.22 -2.67 0.222 20.86
10 ETIFTVLAL 9 1.778 -1.00 173.33 -1.29 0.111 8.89
11 FVIGGMTGV 9 1.744 -0.00 107.78 -2.06 0.111 9.97
12 WLSLLVPFV 9 2.144 -0.00 194.44 -2.75 0.222 65.40
13 KLVALGINAV 10 1.630 1.00 195.00 -1.52 0.000 -7.98
14 EIPQFMIGL 9 0.944 -1.00 130.00 -0.96 0.111 40.11
15 EIINNGISY 9 0.056 -1.00 130.00 0.88 0.111 4.16
16 EVFEIIRSY 9 0.267 -1.00 118.89 1.69 0.222 79.11
17 DTTTDISKY 9 -1.178 -1.00 43.33 3.26 0.111 -7.81
18 GLYEAIEEC 9 0.044 -3.06 97.78 0.74 0.111 134.38
19 LLTEVETYV 9 0.700 -2.00 151.11 0.11 0.111 30.29
20 WLWVSSSDM 9 0.244 -1.00 75.56 0.33 0.222 82.28
Additional Property Statistics
Aliphatic Index from 32.22 to 224.00
Boman Index from -3.13 to 4.65
Aromaticity from 0.000 to 0.300
Instability Index from -32.89 to 134.38
Property Interpretation

Aliphatic Index: Relative volume of aliphatic side chains (higher = more thermostable)

Boman Index: Protein-protein interaction potential

Aromaticity: Fraction of aromatic amino acids (Phe, Trp, Tyr)

Instability Index: <40 = stable, >40 = unstable protein

Most Common Peptide K-mers per Sequence

Peptide K-merCount
DFF 1
FFP 1
FLP 1
FPS 1
LPS 1

Peptide K-merCount
CRY 1
FPR 1
PRC 1
RCR 1
RYV 1

Peptide K-merCount
AIR 1
IRT 1
RTR 1
RYW 1
SRY 1

Peptide K-merCount
AER 1
ERL 1
ESA 1
KAY 1
LKA 1

Peptide K-merCount
EPV 1
HGV 1
ILK 1
KEP 1
LKE 1

Peptide K-merCount
FRL 1
LMR 1
MRT 1
NFL 1
RLM 1

Peptide K-merCount
GLI 1
ILF 1
LFV 1
LGL 1
LIL 1

Peptide K-merCount
AYF 1
FIA 1
IAY 1
III 1
IIP 1

Peptide K-merCount
FTL 1
FVF 1
GFV 1
GIL 1
ILG 1

Peptide K-merCount
ETI 1
FTV 1
IFT 1
LAL 1
TIF 1

Peptide K-merCount
FVI 1
GGM 1
GMT 1
IGG 1
MTG 1

Peptide K-merCount
LLV 1
LSL 1
LVP 1
PFV 1
SLL 1

Peptide K-merCount
ALG 1
GIN 1
INA 1
KLV 1
LGI 1

Peptide K-merCount
EIP 1
FMI 1
IGL 1
IPQ 1
MIG 1

Peptide K-merCount
EII 1
GIS 1
IIN 1
INN 1
ISY 1

Peptide K-merCount
EII 1
EVF 1
FEI 1
IIR 1
IRS 1

Peptide K-merCount
DIS 1
DTT 1
ISK 1
SKY 1
TDI 1

Peptide K-merCount
AIE 1
EAI 1
EEC 1
GLY 1
IEE 1

Peptide K-merCount
ETY 1
EVE 1
LLT 1
LTE 1
TEV 1

Peptide K-merCount
LWV 1
SDM 1
SSD 1
SSS 1
VSS 1
Missing Object
Column: Reference_Name
Number of Values: 59434
Number of Unique: 16106
Number of Missing: 40565 (40.57%)
Number type: str
Value Count Frequency
HPVG analog 65 0.0010936500992697782
3033.0002 30 0.0005047615842783591
3033.0009 30 0.0005047615842783591
3033.0016 29 0.00048793619813574724
3033.0043 27 0.00045428542585052327
71020 25 0.00042063465356529934
71444 24 0.00040380926742268733
71016 24 0.00040380926742268733
71013 24 0.00040380926742268733
71027 24 0.00040380926742268733
7014.0005 23 0.00038698388128007537
73567 23 0.00038698388128007537
70964 23 0.00038698388128007537
76951 23 0.00038698388128007537
71017 23 0.00038698388128007537
7014.0001 22 0.0003701584951374634
7014.0002 22 0.0003701584951374634
7014.0003 22 0.0003701584951374634
7014.0004 22 0.0003701584951374634
7014.0006 22 0.0003701584951374634
Statistic Value
Unique Categories16106
ModeHPVG analog
Entropy13.55
Gini Coefficient1.0
Simpson Diversity9565.43
Min Category Length1.0
Max Category Length60.0
Memory2122293
Cardinality Ratio0.271
Missing Number High Correlation
Column: start
Number of Values: 97812
Number of Unique: 2552
Number of Missing: 2187 (2.19%)
Number type: float64
Correlates with:
End : 0.9999999322390251
Value Count Frequency
1.0 485 0.004958491800597064
6.0 425 0.004345070134543819
18.0 421 0.004304175356806936
2.0 416 0.004253056884635832
7.0 410 0.0041917147180305075
4.0 406 0.004150819940293624
17.0 367 0.0037520958573590154
8.0 361 0.003690753690753691
74.0 360 0.00368052999631947
47.0 359 0.0036703063018852494
35.0 350 0.0035782930519772625
54.0 348 0.003557845663108821
41.0 345 0.003527174579806159
11.0 341 0.0034862798020692757
12.0 338 0.0034556087187666137
39.0 332 0.003394266552161289
56.0 331 0.003384042857727068
53.0 328 0.003353371774424406
66.0 323 0.0033022533022533025
70.0 321 0.003281805913384861
Statistic Value
Minimum1.0
Maximum7025.0
Mean596.07
Median223.0
Mode1.0
Standard Deviation1117.58
Sum58303118.0
Kurtosis12.64
Skewness3.46
Median Abs Deviation158.5
Coefficient of Variation1.87
Quantiles 25 %: 90.0
50 %: 223.0
75 %: 489.0
Memory1599984
Missing Number High Correlation
Column: end
Number of Values: 97812
Number of Unique: 2543
Number of Missing: 2187 (2.19%)
Number type: float64
Correlates with:
Start : 0.9999999322390251
Value Count Frequency
9.0 443 0.004529096634359792
15.0 403 0.004120148856990962
27.0 398 0.004069030384819859
14.0 391 0.0039974645237803134
12.0 389 0.003977017134911872
25.0 385 0.003936122357174989
10.0 382 0.0039054512738723266
16.0 373 0.00381343802396434
26.0 356 0.003639635218582587
21.0 352 0.003598740440845704
61.0 347 0.0035476219686746
55.0 346 0.0035373982742403793
49.0 340 0.003476056107635055
64.0 340 0.003476056107635055
13.0 331 0.003384042857727068
50.0 331 0.003384042857727068
42.0 330 0.0033738191632928477
48.0 330 0.0033738191632928477
111.0 327 0.0033431480799901853
43.0 326 0.0033329243855559645
Statistic Value
Minimum8.0
Maximum7034.0
Mean604.27
Median231.0
Mode9.0
Standard Deviation1117.59
Sum59104685.0
Kurtosis12.64
Skewness3.46
Median Abs Deviation158.0
Coefficient of Variation1.85
Quantiles 25 %: 98.0
50 %: 231.0
75 %: 497.0
Memory1599984
Missing Object
Column: protein
Number of Values: 98077
Number of Unique: 3628
Number of Missing: 1922 (1.92%)
Number type: str
Value Count Frequency
Replicase polyprotein 1ab 7109 0.07248386471853747
L protein 2813 0.02868154613212068
Hypothetical protein 2029 0.020687826911508306
Genome polyprotein 1929 0.019668219868062848
nucleoprotein NP 1806 0.018414103204624936
Nucleoprotein 1646 0.01678273193511221
glycoprotein 1474 0.015029007820386023
involved in plaque and EEV formation 1472 0.015008615679517114
Spike glycoprotein precursor 1217 0.012408617718731202
Major core protein P4a precursor 1095 0.011164697125727745
Protein G7 1075 0.010960775717038653
Protein K4 1054 0.010746658237915107
unknown 1044 0.010644697533570563
structural glycoprotein 935 0.009533325856215014
nucleocapsid protein 892 0.009094894827533468
Hypothetical protein VACWR082 891 0.009084698757099014
glycoprotein precursor 884 0.009013326264057831
Serine proteinase inhibitor 2 873 0.008901169489278831
36 kDa late protein I1 838 0.008544307024072922
Hypothetical protein VACWR050 833 0.008493326671900649
Statistic Value
Unique Categories3628
ModeReplicase polyprotein 1ab
Entropy9.05
Gini Coefficient0.99
Simpson Diversity96.83
Min Category Length2.0
Max Category Length248.0
Memory4097202
Cardinality Ratio0.037
Taxonomy invalid Missing Object
Column: source_organism
Number of Values: 98077
Number of Unique: 671
Number of Missing: 1922 (1.92%)
Number type: str
Taxonomy: Invalid
Value Count Frequency
Vaccinia virus WR 21479 0.21900139686164952
SARS coronavirus Tor2 8208 0.08368934612600304
Zaire ebolavirus 4164 0.042456437289068796
Homo sapiens 3764 0.038378009115286965
Giardia lamblia ATCC 50803 3417 0.03483997267453123
SARS-CoV1 2481 0.025296450747881765
Mycobacterium tuberculosis 2395 0.024419588690518673
Escherichia coli O157:H7 1630 0.016619594808160935
Vibrio cholerae 1291 0.013163126930880838
Camelpox virus M-96 1272 0.012969401592626202
Yersinia pestis 1189 0.012123127746566473
Mammarenavirus brazilense 1046 0.01066508967443947
Vibrio vulnificus 1037 0.01057332504052938
Influenza A virus (A/Puerto Rico/8/34/Mount Sinai(H1N1)) 1035 0.01055293289966047
Mammarenavirus lassaense 1029 0.010491756477053743
Vibrio parahaemolyticus 1011 0.01030822720923356
Sudan ebolavirus 989 0.01008391365967556
Mammarenavirus guanaritoense 968 0.009869796180552014
Pseudomonas aeruginosa 943 0.00961489441969065
Marburg virus - Musoke, Kenya, 1980 934 0.00952312978578056
Taxonomy
Chlamydia trachomatis Serovar L2
H1N1 subtype Influenza A/Oklahoma/7485/01
H1N1 swine influenza virus (A/swine/Korea/S10/2004(H1N1))
H9N2 subtype Influenza A virus (A/swine/Korea/S81/2004(H9N2))
H9N2 subtype Influenza A virus (A/swine/Korea/S83/2004(H9N2))
Hepatitis B virus subtype AYR
Hepatitis delta virus TW2667
Human gammaherpesvirus 4
Human herpesvirus 4 GD1
Human herpesvirus 4 WW1
Influenza A virus H3N2 (A/Resvir-9 (H3N2))
Lymphocytic choriomeningitis virus (strain Armstrong) (clone 53b)
Monkeypox virus USA_2003_039
SARS-CoV1
Variola major virus India-1967
West Nile virus NY-99
Statistic Value
Unique Categories671
ModeVaccinia virus WR
Entropy5.96
Gini Coefficient0.94
Simpson Diversity15.68
Min Category Length9.0
Max Category Length70.0
Memory3812721
Cardinality Ratio0.007
Constant Object
Column: host
Number of Values: 99999
Number of Unique: 1
Number of Missing: 0 (0.0%)
Number type: str
Taxonomy: Valid
Value Count Frequency
Homo sapiens (human) 99999 1.0
Statistic Value
Unique Categories1
ModeHomo sapiens (human)
Entropy-0.0
Gini Coefficient0.0
Simpson Diversity1.0
Min Category Length20
Max Category Length20
Memory3599964
Cardinality Ratio0.0
Object
Column: method
Number of Values: 99999
Number of Unique: 19
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
purified MHC/direct/fluorescence 50233 0.5023350233502335
purified MHC/competitive/radioactivity 45128 0.45128451284512844
cellular MHC/direct/fluorescence 2177 0.021770217702177023
purified MHC/competitive/fluorescence 1463 0.014630146301463014
purified MHC/direct/radioactivity 249 0.0024900249002490025
lysate MHC/direct/radioactivity 187 0.001870018700187002
cellular MHC/competitive/fluorescence 136 0.0013600136001360014
x-ray crystallography 84 0.0008400084000840008
cellular MHC/competitive/radioactivity 70 0.0007000070000700007
secreted MHC/mass spectrometry 59 0.0005900059000590006
cellular MHC/mass spectrometry 58 0.0005800058000580006
Edman degradation 41 0.0004100041000410004
cellular MHC/direct/radioactivity 40 0.0004000040000400004
cellular MHC/T cell inhibition 28 0.0002800028000280003
coelution 18 0.00018000180001800017
T cell recognition 11 0.00011000110001100011
lysate MHC/direct/fluorescence 9 9.000090000900009e-05
purified MHC/direct/phage display 4 4.000040000400004e-05
binding assay 4 4.000040000400004e-05
Statistic Value
Unique Categories19
Modepurified MHC/direct/fluorescence
Entropy1.32
Gini Coefficient0.54
Simpson Diversity2.19
Min Category Length9
Max Category Length38
Memory5076750
Cardinality Ratio0.0
Object
Column: response_measured
Number of Values: 99999
Number of Unique: 10
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
dissociation constant KD (~IC50) 42795 0.4279542795427954
dissociation constant KD (~EC50) 36176 0.36176361763617637
qualitative binding 15320 0.15320153201532016
half maximal inhibitory concentration (IC50) 3768 0.03768037680376804
half life 1574 0.015740157401574015
ligand presentation 187 0.001870018700187002
3D structure 84 0.0008400084000840008
half maximal effective concentration (EC50) 77 0.0007700077000770008
50% dissociation temperature 9 9.000090000900009e-05
dissociation constant KD 9 9.000090000900009e-05
Statistic Value
Unique Categories10
Modedissociation constant KD (~IC50)
Entropy1.78
Gini Coefficient0.66
Simpson Diversity2.95
Min Category Length9
Max Category Length44
Memory4606434
Cardinality Ratio0.0
Missing Object
Column: units
Number of Values: 84492
Number of Unique: 4
Number of Missing: 15507 (15.51%)
Number type: str
Value Count Frequency
nM 82825 0.9802703214505515
min 1574 0.01862898262557402
angstroms 84 0.0009941769634995029
°C 9 0.00010651896037494674
Statistic Value
Unique Categories4
ModenM
Entropy0.15
Gini Coefficient0.04
Simpson Diversity1.04
Min Category Length2.0
Max Category Length9.0
Memory1778681
Cardinality Ratio0.0
Object
Column: qualitative
Number of Values: 99999
Number of Unique: 5
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
Negative 58504 0.5850458504585045
Positive-Low 12928 0.12928129281292813
Positive-High 12451 0.12451124511245112
Positive-Intermediate 11441 0.11441114411144111
Positive 4675 0.04675046750467505
Statistic Value
Unique Categories5
ModeNegative
Entropy1.77
Gini Coefficient0.61
Simpson Diversity2.57
Min Category Length8
Max Category Length21
Memory2662676
Cardinality Ratio0.0
Missing Object
Column: inequality
Number of Values: 38618
Number of Unique: 3
Number of Missing: 61381 (61.38%)
Number type: str
Value Count Frequency
> 29130 0.7543114609767466
= 9486 0.24563674970221142
>= 2 5.178932104200114e-05
Statistic Value
Unique Categories3
Mode>
Entropy0.81
Gini Coefficient0.37
Simpson Diversity1.59
Min Category Length1.0
Max Category Length2.0
Memory1648334
Cardinality Ratio0.0
Missing Number
Column: quantitative
Number of Values: 86472
Number of Unique: 10904
Number of Missing: 13527 (13.53%)
Number type: float64
Value Count Frequency
20000.0 26587 0.30746368766768434
70000.0 655 0.007574706263299103
1.0 521 0.006025071699509668
10000.0 464 0.0053658987880469975
77700.0 414 0.004787676935886761
78100.0 406 0.004695161439541123
77900.0 400 0.004625774817281895
5000.0 244 0.002821722638541956
69800.0 237 0.0027407715792395226
77500.0 226 0.0026135627717642704
77800.0 222 0.0025673050235914515
69400.0 212 0.002451660653159404
69700.0 180 0.0020815986677768525
2.0 169 0.0019543898603016003
76900.0 162 0.0018734388009991674
0.4 159 0.001838745489869553
70300.0 153 0.0017693588676103247
70100.0 151 0.0017462299935239152
3.0 146 0.0016884078083078916
25000.0 146 0.0016884078083078916
Statistic Value
Minimum0.0
Maximum14276600.0
Mean15882.14
Median7610.0
Mode20000.0
Standard Deviation66040.92
Sum1373360464.72
Kurtosis26750.9
Skewness137.97
Median Abs Deviation7607.9
Coefficient of Variation4.16
Quantiles 25 %: 160.0
50 %: 7610.0
75 %: 20000.0
Memory1599984
Object
Column: mhc_allele
Number of Values: 99999
Number of Unique: 95
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
HLA-A*02:01 11156 0.11156111561115611
HLA-A*03:01 6230 0.062300623006230064
HLA-A*11:01 5744 0.05744057440574406
HLA-A*31:01 4707 0.047070470704707046
HLA-A*01:01 3984 0.03984039840398404
HLA-B*07:02 3805 0.03805038050380504
HLA-B*15:01 3390 0.03390033900339003
HLA-A*68:02 3319 0.033190331903319034
HLA-A*02:03 3122 0.03122031220312203
HLA-B*40:01 3000 0.03000030000300003
HLA-B*58:01 2817 0.02817028170281703
HLA-A*02:06 2751 0.027510275102751027
HLA-A*02:02 2723 0.027230272302723027
HLA-A*26:01 2637 0.026370263702637026
HLA-A*24:02 2624 0.026240262402624025
HLA-A*33:01 2621 0.026210262102621028
HLA-A*68:01 2621 0.026210262102621028
HLA-B*27:05 2588 0.025880258802588027
HLA-A*69:01 2352 0.023520235202352024
HLA-B*08:01 2340 0.023400234002340023
Statistic Value
Unique Categories95
ModeHLA-A*02:01
Entropy5.23
Gini Coefficient0.96
Simpson Diversity26.25
Min Category Length6
Max Category Length11
Memory2694041
Cardinality Ratio0.001
Constant Object
Column: mhc_class
Number of Values: 99999
Number of Unique: 1
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
I 99999 1.0
Statistic Value
Unique Categories1
ModeI
Entropy-0.0
Gini Coefficient0.0
Simpson Diversity1.0
Min Category Length1
Max Category Length1
Memory1699983
Cardinality Ratio0.0
Object
Column: bind_class
Number of Values: 99999
Number of Unique: 2
Number of Missing: 0 (0.0%)
Number type: str
Value Count Frequency
Negative 71432 0.7143271432714328
Positive 28567 0.2856728567285673
Statistic Value
Unique Categories2
ModeNegative
Entropy0.86
Gini Coefficient0.41
Simpson Diversity1.69
Min Category Length8
Max Category Length8
Memory2399976
Cardinality Ratio0.0
Number
Column: length
Number of Values: 99999
Number of Unique: 4
Number of Missing: 0 (0.0%)
Number type: int64
Value Count Frequency
9 79935 0.7993579935799358
10 19295 0.19295192951929518
11 419 0.004190041900419004
8 350 0.0035000350003500033
Statistic Value
Minimum8.0
Maximum11.0
Mean9.2
Median9.0
Mode9.0
Standard Deviation0.42
Sum919774.0
Kurtosis1.37
Skewness1.54
Median Abs Deviation0.0
Coefficient of Variation0.05
Quantiles 25 %: 9.0
50 %: 9.0
75 %: 9.0
Memory1599984